microsoft / rushstack

Monorepo for tools developed by the Rush Stack community
https://rushstack.io/
Other
5.95k stars 598 forks source link

[rush] AsyncRecycler on Windows conflicts with security scanner and sometimes fails to delete folders #2632

Open kaiyoma opened 3 years ago

kaiyoma commented 3 years ago

Summary

Full discussion with @octogonz here: https://rushstack.zulipchat.com/#narrow/stream/262513-general/topic/Rogue.20powershell.20process

Here are my takeaways on the async recycler:

  1. It sometimes doesn't start
  2. It runs pretty slowly, possibly because it's raising alarms for security tools
  3. The PowerShell window flash is a bit annoying (more minor UI issue)

Repro steps

On a Windows 10 machine, clean out the rush-recycler directory and run rush update -p. The recycler directory is filled with more files and disk usage than all of node_modules:

$ find common/temp/rush-recycler/ -type f | wc -l
138944

$ du -sh common/temp/rush-recycler/
650M    common/temp/rush-recycler/

$ find common/temp/node_modules/ -type f | wc -l
77540

$ du -sh common/temp/node_modules/
585M    common/temp/node_modules/

Related to item #1 above, I can see that the PowerShell process sometimes never starts. I don't know why. Is there a logfile I can check?

Related to item #2 above, the recycler process seems to "trip the alarm" for a security tool on my laptop (CoCoSys Endpoint Protector), which then ramps up its CPU and disk usage:

image

I'm willing to bet the security tool is doing a lot of scanning, which is going to slow down the file removal process even more.

Details

@octogonz suggested I try a test rimraf command locally and the results are much better. I was able to delete 100k files from the recycler directory in less than a minute. The PowerShell process would take closer to 5 minutes to do the same thing.

Standard questions

Please answer these questions to help us investigate your issue more quickly:

Question Answer
@microsoft/rush globally installed version? 5.43.0
rushVersion from rush.json? 5.43.0
useWorkspaces from rush.json? true
Operating system? Windows 10
Would you consider contributing a PR? No
Node.js version (node -v)? 14.15.0
kaiyoma commented 3 years ago

Proposed fix: replace all the PowerShell code with a Node-based solution, which is cross-platform and evidently faster.

octogonz commented 3 years ago

Proposed fix: replace all the PowerShell code with a Node-based solution, which is cross-platform and evidently faster.

Based on my own testing, I'm in favor of replacing PowerShell with Node.js for Windows. Anecdotally:

For MacOS/Linux we might want to do some perf measurements. The current implementation uses the native rm -f which might outperform Node.js. (?)

Even if PowerShell is moderately faster, a pure Node.js solution is easier to debug. And a PowerShell script deleting tons of files is understandably sketchy-looking to a security scanner like in @kaiyoma's case.

elliot-nelson commented 3 years ago

One possible caveat: we discovered rimraf had trouble on Windows machines with paths longer than MAX_PATH (260 characters). Back in the day your nested node_modules paths could sometimes go past this limit and then trying to rimraf those folders would end up generating errors.

The only way we found at the time to get around that issue was to use robocopy purge on Windows and rimraf on Mac/Linux. On Windows 10, it's possible that the MAX_PATH situation is resolved (and/or rimraf handles it better?). But it's worth testing an intentionally very long path on Windows vs Mac.

kaiyoma commented 2 years ago

Any updates here? I run into this problem pretty often and I have to manually kill the PowerShell process and delete the rush-recycler files myself, otherwise my system will thrash for 15-30 minutes with high CPU usage. I'm happy to be a Windows 10 beta tester if that helps move this along. 😄