Apparent crash in Windows WSL

jeffrson commented 1 year ago

Describe the bug It appears there's a race (at least when run in a WSL terminal). It works in Windows (Powershell) and Linux (Bash via putty). In our mono-repo all projects actually build fine (yarn run build inside the workspace). However, with yarn.build about 5 out of 35 projects fail to build. When I run yarn build again those 5 projects are successfully built. Sometimes I see that the time increment stops for several seconds, and sometimes yarn.build even completely stops without any message, but leaves some projects unbuilt and an empty yarn.build.json.

Also, with yarn build -i -v I cannot see any additional output.

Finally, I cannot tell, which version it is (yesterday, tried to update today) and cannot try an older version (due to availability). But IMO this is new (it used to work previously).

To Reproduce maybe only locally reprocucible

Expected behavior Build mono-repo in WSL just like as in Linux.

Desktop (please complete the following information):

OS: [e.g. macOS, Linux, Windows/WSL] WSL (1.0.0)

Additional context How can I provide more information? What should I try?

Edit: yarn.build-error.log is empty Edit2: Works with '-m 4', on my octacore processor, with '-m 8' "only" 2 instead of former 5 projects fail to build. So I need to retest on Linux (native, because the VM only has 4 cores anyway). Edit3: Works in Linux natively on the same machine (8 cores, 16 slots)

ojkelly commented 1 year ago

Thanks for the detailed report, it sounds like it might be locking up at some point. Or we might be causing excessive context switching if say one of the build steps is multi-threaded.

Can you try limiting the concurrency under WSL?

Yarn.build tries to use all your cores to build, but it does it in a reasonably naive approach at the moment. For example I’ve found on a m1 mac (with 4 high perf cores, and 4 low perf high efficiency cores) that a concurrency of 4 is almost the same as 8. Presumably because the work is scheduled onto the high perf cores.

I’ve mused the idea of having the default concurrency to be 1 less that your CPU count when not in CI, but I haven’t had a chance to thoroughly test the idea.

All the previous versions are stored on the github releases https://github.com/ojkelly/yarn.build/releases

You can download and replace the file in .yarn/plugins/@ojkelly/yarn.build directly to downgrade.

If you can find a version where this doesn’t happen, we should be able to find the source of the bug.

jeffrson commented 1 year ago

Yes, indeed (wrote it in "edit") - it seems the behaviour changes with less concurrency.

Nevertheless, my CPU has 8 cores, but os.cpu() reports 16 (including hyperthreading obviously). Would be great, if you could determine the real number of cores.

I'll try with an older version. Could you have a look at why '-i' and '-v' don't output anything useful?

jeffrson commented 1 year ago

I found that it works up to 3.4.9, but fails from 3.5.0.

BTW, in

Up to date: 0
Fail: 0
Skipped: 2
Excluded: 0
Total: 35

I don't know which the 2 skipped packages are - there shouldn't be any. Could this be related? Is there any way to know which packages are skipped?

ojkelly commented 1 year ago

I found that it works up to 3.4.9, but fails from 3.5.0.

Thanks for checking the versions. For reference, this is the diff between 3.4.9 and 3.5.0 https://github.com/ojkelly/yarn.build/compare/v3.4.9...v.3.5.0

One thing of note in that change was a loosening of how we [handle exit codes](https://github.com/ojkelly/yarn.build/compare/v3.4.9...v.3.5.0#diff-1a913acddb70c7160a82a86bfb2623203aef69ab33a779fdd2d3078a12c2ab1fL1276

Given this only fails on powershell, I wonder if we need to add in some special handling for powershell's exit codes?

Essentially it's setup as anything non-zero is a fail. My understanding is linux/unix/powershell are in agreement with that.

I don't know which the 2 skipped packages are - there shouldn't be any. Could this be related? Is there any way to know which packages are skipped?

try running in CI mode, which prints a log instead so CI=true yarn build

The .yarn/yarn.build.json keeps track of the results, and is useful to look at.

However, I would have expected Up to date: 33 given 2 out of 35 were skipped.).

Using 3.5.0:

Can you delete .yarn/yarn.build.json and then run yarn build --dry-run. It should print something like this, which will show you the dependency graph and run order (the number on the right).

[ Run Order ]-------------------------------------------------------------------
├─[0] packages/examples/words/adipiscing
├─[0] packages/examples/words/amet
├─[0] packages/examples/words/consectetur
├─[0] packages/examples/words/dolor
├─[0] packages/examples/words/elit
├─[0] packages/examples/words/ipsum
├─[0] packages/examples/words/lorem
├─[0] packages/examples/words/sit
├─[0] packages/examples/words/quitter
├─[0] packages/plugins/shared
└─[0] packages/plugins/plugin-package-yaml
  ├─[1] packages/examples/phrases/in-hac
  ├─[1] packages/examples/phrases/nullam-risus
  ├─[1] packages/plugins/plugin-build
  └─[1] packages/plugins/plugin-bundle
    ├─[2] packages/examples/phrases/lorem-ipsum
    └─[2] packages/plugins/plugin-test
      ├─[3] packages/examples/lorem-ipsum
      ├─[3] packages/examples/lorem-ipsum-docker
      └─[3] packages/plugins/plugin-all
[ Dry Run / Command: build / Total: 20 ]--------------------------[ yarn.build ]

And then, can you do it again after running yarn build, if they all built successfully it will be empty.

jeffrson commented 1 year ago

Well, it works (for me) in PowerShell (Windows - Linux not tried), but doesn't in WSL Bash.

I already had a look at the diff, but it seems large and I cannot really identify where the problem might be. I'll try CI mode as soon as possible.

ojkelly commented 1 year ago

Yep sorry, just re-reading your original post where you covered off a heap of the questions.

Could this be a specific issue with WSL 1, I wonder if it also occurs on WSL 2.

To rule out specific codebase issues, can you close this repo and run yarn build. The example folder contains a contrived but simple set of packages that will be a good check to rule out things specific to your repo.

jeffrson commented 1 year ago

I'm using WSL2 - here's the version info:

$ wsl -v
WSL-Version: 1.0.0.0
Kernelversion: 5.15.74.2
WSLg-Version: 1.0.47
MSRDC-Version: 1.2.3575
Direct3D-Version: 1.606.4
DXCore-Version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows Version: 10.0.19044.2311
$ wsl -l -v
  NAME                   STATE           VERSION
* Debian                 Running         2
  docker-desktop-data    Stopped         2
  docker-desktop         Stopped         2

CI=true stops after a couple of packages (probably when the first failure occurs) without any message (shows the last successfully built package I assume). It shows the dependency graph, but the next yarn build after the failure starts over from the beginning.

Have "latest" again for these tests.

jeffrson commented 1 year ago

New test now: stop at [ 23:0/35 1m 10.97s ]---------------------------------------------[ yarn.build ] without any yarn.build.json (before and after).

jeffrson commented 1 year ago

BTW, yarn build in the clone of this repository works fine.

jeffrson commented 1 year ago

It seems like I have found the issue. Not a bug in the common sense. It seems that starting with v3.5.0 yarn.build needs a lot more memory (latest even more). 3.4.9 builds with my project with about 10GB (8GB RAM + 2GB Swap), 3.5.0 builds with ~11.5GB (10GB RAM, 1.5GB Swap) however, 14GB is not sufficient (10GB RAM, 4GB Swap) with latest. When I increase the memory of the VM (which is only possible for tests unfortunately) I can build successfully.

It's not just my project - I tried with "ultra-runner" (ultra -r --build) which builds fine even with 8GB RAM/2GB Swap (I still have to check exact needs).

ojkelly / yarn.build

Apparent crash in Windows WSL #243