vercel / turborepo

Build system optimized for JavaScript and TypeScript, written in Rust
https://turbo.build/repo/docs
MIT License
26.27k stars 1.81k forks source link

Turbo daemon uses 100% CPU even when no tasks are running #8122

Open AdiRishi opened 5 months ago

AdiRishi commented 5 months ago

Verify canary release

Link to code that reproduces this issue

N.A

What package manager are you using / does the bug impact?

pnpm

What operating system are you using?

Mac

Which canary version will you have in your reproduction?

1.13.4-canary.3

Describe the Bug

I recently noticed that after running a turborepo command in my monorepo, the CPU would stay at 100% even after the command had finished. I saw that the turbo process was still alive and kicking. After a bit of investigation, I realised that I could trigger this behavior by running the turbo daemon. I've included a vide of the behavior I see. NOTE I did install the latest canary version of turbo and test with that, same behavior.

I tried to reproduce this on a new repository made with npx create-turbo@latest -e with-shell-commands however that DID NOT reproduce the issue. Running pnpm turbo daemon start in that repository did not cause the CPU to spike with a long lived turbo process.

Given that I was't able to reproduce this in a new repository, I tried thought to review if I was doing something odd in my current repository, the only things I can think off are

To Reproduce

https://github.com/vercel/turbo/assets/8351234/a4d74f2d-e1fc-46e0-9bb0-7c8960671bc4

Additional context

The next thing I was going to try was to delete my local copy of the repository and try to re-clone and set it up again to see if the issue persists. However I figured it may be better to make this issue first in case there are specific debugging steps that may reveal the source of the issue.

NicholasLYang commented 5 months ago

Hi @AdiRishi, thanks for the issue. Could you share the output of turbo daemon logs? You can also access the log file directly by running turbo daemon status and going to the log file path. If you're not comfortable sharing it here, you can also send it to me at nicholas.yang@vercel.com

AdiRishi commented 5 months ago

Logs seem pretty empty 🙃

https://github.com/vercel/turbo/assets/8351234/07176267-0a81-4d46-b9d4-9adfbeefe47e

NicholasLYang commented 5 months ago

Hmm very interesting. How large of a repository are you running inside of? And do you see any logs after a significant amount of time, say 10 minutes?

AdiRishi commented 5 months ago

Hmm very interesting. How large of a repository are you running inside of? And do you see any logs after a significant amount of time, say 10 minutes?

I'd say it's a mid-size repository. Around 21 sub-projects in total, it's a mix of around 6 webapps, 7 cloudflare workers and then more utility libaries / configs etc. Nothing crazy.

I'll run some further debugging and get the information you want. I'll also try to continue to re-clone and see if I can reporduce the issues on other systems. I'll get back to you on this.

NicholasLYang commented 5 months ago

Gotcha. Any chance you could run the daemon directly by stopping it (turbo daemon stop) then doing turbo daemon -vvv? This will run it in full verbosity. Hopefully that should give a better idea of where the daemon is stalling.

AdiRishi commented 5 months ago

Alright, I have some very interesting discoveries to go through.

First off I want to start with the fact that when I re-clone this repository into a different location, and run the daemon from it, this behavior does not occur.

Next I tried to run turbo daemon -vvv on a different turborepo repository which doesn't exhibit this issue. Here was the output, seems fairly normal. The logs stopped after a few seconds. arishi-monorepo-daemon-logs.txt

I then ran turbo daemon -vvv on the problem repository, and the logs wouldn't stop. I've captured around 1 minute of logs in this file. The full logfile is around 25MB so I had to gzip it 😅 bad-monorepo-daemon-logs.txt.gz

I've captured both logfiles using a command like this on mac pnpm turbo daemon -vvv &> daemon-logs.txt.

Root Cause of Bug

Looking through the bad logs I realised there were mentions of a .git folder in workers/turborepo-remote-cache. This was confusing since I didn't think I had git submodules. I went into the directory, and sure enough, there is an inner git repository here with unstaged changes 🙃 . I think around 1 month ago I was updating my local copy of this worker and I accidentally left the git repository cloned and forgot to remove the .git folder. So it seems like having this unstaged change causes whatever the turbo daemon is doing to spin in an infinite loop.

I confirmed this by removing the unstaged changes and deleting the .git folder in the worker/turborepo-remote-cache folder and everything is back to normal 🎉

Still, a very very odd manifestation, definitely does indicate a subtle bug in turbo haha. I'm happy to help with more debugging if it will be helpful to fix the underlying bug :)

NicholasLYang commented 5 months ago

Thanks for the very thorough explanation! This should be a pretty easy fix. We currently filter out change events for the root .git folder, but we can probably extend that to be any nested .git folder too.

giorgiogross commented 5 months ago

+1 here, I had tubo spawning so many processes that I casually ran into a fork failed: resource temporarily unavailable on my terminal. After running a turbo command the CPU usage would slowly creep up until the max processes count was reached for my system.

Screenshot 2024-05-20 at 13 54 27

I was in the process of consolidating my code into a monorepo, and overlooked that there was a nested .git folder remaining. After removing that one turbo seems to not cause this issue anymore.

hmnd commented 5 months ago

Related #3455

samhh commented 2 months ago

We've just had to disable the daemon in our repo as it was severely harming the performance of pnpm deploy, which copies files and the workspace's runtime dependencies to an isolated directory. I'm guessing it's a similar root cause to this issue.

Cypher1 commented 1 month ago

I think this might be caused by this other bug:

https://github.com/vercel/turborepo/issues/8932

karfau commented 1 month ago

Today I have experienced it again after a long time without issues...needed to kill the process