vercel / turborepo

Build system optimized for JavaScript and TypeScript, written in Rust
https://turbo.build/repo
MIT License
26.5k stars 1.85k forks source link

🐛 Bug: Turbo daemon creates / leaves a ton of `<defunct>` processes, accumulating enough sometimes to breach the OS-wide process limit, preventing the creation of any new processes. #9455

Open NullVoxPopuli opened 1 week ago

NullVoxPopuli commented 1 week ago

Verify canary release

Link to code that reproduces this issue

I think: all turbo projects running turbo while in interactive-rebase.

This is a pretty bad bug, because MacOS only has a limit of ~ 5600 processes, and once you hit that, you can't spawn terminals, can't open apps, can't create new tabs in the browser, can't run ps, even.

You have to have already had activity monitor (or similar) open so that you can kill the turbo daemon process. Else you may be forced to reboot.

Which canary version will you have in your reproduction?

2.3.1-canary.0

Enviroment information

❯ pnpm turbo info
turbo 2.3.1-canary.0

CLI:
   Version: 2.3.1-canary.0
   Path to executable: <.pnpm>/turbo-darwin-arm64@2.3.1-canary.0/node_modules/turbo-darwin-arm64/bin/turbo
   Daemon status: Running
   Package manager: pnpm9

Platform:
   Architecture: aarch64
   Operating system: macos
   WSL: false
   Available memory (MB): 10455
   Available CPU cores: 12

Environment:
   CI: None
   Terminal (TERM): alacritty
   Terminal program (TERM_PROGRAM): unknown
   Terminal program version (TERM_PROGRAM_VERSION): unknown
   Shell (SHELL): /opt/homebrew/Cellar/bash/5.2.32/bin/bash
   stdin: false

Setup, check processes:

ps -ef | grep defunct | wc -l
# 1 or 2

Normally, an OS should be around < 1000 processes:

ps -ef | wc -l
# I usually hover around 600 to 800

Scenario A (inconsistent)

Scenario B (inconsistent)


Test:

ps -ef | grep defunct | wc -l
# 807

Test after upgrading to latest canary (noting that we run build in postinstall):

❯ ps -ef | grep defunct | wc -l
#    1435

I have an ongoing monitor for this running every second in a terminal that I just leave up all the time.

❯ watch -n 1 "echo \"All: \$(ps -ef | wc -l), Defunct: \$(ps -ef | grep defunct | wc -l)\""

And with pstree we can see that these all come from turbo

# get a list of all unique parent processes for each defunct process
❯ ps -ef | grep defunct | awk '{print $3}' | sort -u

# pass each of these to pstree
while IFS= read -r pid; do
    pstree -p $pid
done <<< $(ps -ef | grep defunct | awk '{print $3}' | sort -u)

Which will print something like this:

-+= 00001 root /sbin/launchd
 \-+= 11557 $USER /opt/homebrew/opt/borders/bin/borders
   \--- 11558 $USER <defunct>
-+= 00001 root /sbin/launchd
 \-+= 43271 $USER <.pnpm>/turbo-darwin-arm64@2.2.3/node_modules/turbo-darwin-arm64/bin/turbo --skip-infer daemon
   |--- 43359 $USER <defunct>
   |--- 43361 $USER <defunct>
   # and a few many hundred more
   \--- 57042 $USER <defunct>

Expected behavior

no defunct processes exist ever, as the OS will not halt these.

Actual behavior

defunct processes are left laying around.

To Reproduce

It's possible this is reproducible in these OSS repos:

I somewhat regularly have to kill the top level turbo daemon on Linux due to CPU usage -- but it's maybe possible that the reason for that is the same root reason that is causing me to observe the behavior that has resulted in me reporting this issue for MacOS.

In both cases, Linux (where I do most of my OSS) and Mac (where I do my closed-source employer-owned work), Killing the turbo daemon processes immediately makes any of my machines happier -- cleaning up defunct processes (macos) or freeing up cpu cycles (linux)

Additional context

No response

wagenet commented 1 week ago

We've seen this on other developer machines at my company as well.

chris-olszewski commented 1 week ago

If either of you could share daemon logs (turbo daemon status should display the logfile) that would be helpful. We should not be spawning child processes from the daemon.

NullVoxPopuli commented 1 week ago

Here is what I got:

❯ pnpm turbo daemon status
# ...
✓ daemon is running
log file: <repo>/.turbo/daemon/e224a4a441d772ef-turbo.log.2024-11-19
uptime: 16m 6s 566mss
pid file: /var/folders/wk/w99lck4x7_5930c7gj65r3s40000gp/T/turbod/e224a4a441d772ef/turbod.pid
socket file: /var/folders/wk/w99lck4x7_5930c7gj65r3s40000gp/T/turbod/e224a4a441d772ef/turbod.sock
ope, big file there is a lot of text ``` There was a problem saving your comment. Your comment is too long (maximum is 65536 characters). Please try again. ``` oops 🙈 here is a file tho

output.txt

as I was poking around in here, I noticed there was a lot of activity from watchman cookies.

NullVoxPopuli commented 1 week ago

It seems this is happening nearly daily for me -- can't really pinpoint what is causing the defunct processes to show up. In Activity Monitor, I do occasionally see > 20 git processes spawn, and then go away -- maybe related? idk.