Closed ghassanmas closed 1 year ago
Can you please try the following solution?
[worker.oci]
max-parallelism = 2
docker buildx create --use --name=max2cpu --driver=docker-container --config=./buildkit.toml
Build the mfe image:
tutor images build mfe
This solution seems to work for me, in the sense that I see only two layers built simultaneously.
If this works for you as well, then I think that it should be the recommended approach, and we should add these instructions to the troubleshooting docs.
Yes @regisb, it did what is expected, I can it only does two thing at a time.
I second @regisb's proposal. A nice and simple way to reduce resource usage.
[Update]
I had to change it to use one worker, when testing on Mac M2 mini with 8GB of RAM:
while building, running docker stats
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
8c34f988213e buildx_buildkit_max2cpu 170.35% 868.6MiB / 7.765GiB 10.92% 470MB / 8.78MB 21.8MB / 3.01GB 46
CPU would fluctate between 100-200%, I/O aroud 800MB, PIDs can reach up to to 60. That when using one worker.
The crash error I would get otherwise, npm killed something.
To use one worker, I had to update the file pointed above and then running this command:
docker buildx create --use --node=max2cpu --driver=docker-container --config=./buildkit.toml
the difference is just using --node
because name exists.
Also after the build is done docker stats
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
8c34f988213e buildx_buildkit_max2cpu 0.00% 1.569GiB / 7.765GiB 20.21% 811MB / 13.8MB 1.33GB / 8.45GB 25
Why is it still running, it make sense that CPU is 0% because build is done, however it still consumes a lot of RAM.
I am not sure what magic does buildx/builder do, but I had to stop it docker stop 8c34f988213e
The builder would initially be inactive docker buildx ls
and would not appear show up in docker ps
or docker stats
as a running container, until a build command is execuated. The probelm again is even after build is done, the builder container would still be running... may be I had to wait for it to stop itself, I couldn't find a relavnt ref in the doc.
I had to change it to use one worker, when testing on Mac M2 mini with 8GB of RAM
Running with just two workers exceeds your 8GB of RAM??? This would mean that building a single MFE requires 4GB of RAM? If this is true then we really need to rename MFE to macrofrontends.
Why is it still running, it make sense that CPU is 0% because build is done, however it still consumes a lot of RAM.
Buildx is actually a process that runs inside a docker container -- as implied by the --driver=docker-container
option you used. I suspect it's using some memory because it's doing garbage collection and other chores in the background. In my experience it's safe to remove the container, as it will automatically be restarted next time you run docker buildx
.
we really need to rename MFE to macrofrontends.
No objections from me. ;P
I changed the title of the issue to reflect the decision proposed in my earlier comment.
For those that end up at this PR trying to solve the following error when running tutor dev launch
:
ERROR: failed to solve: process "/bin/sh -c npm clean-install --no-audit --no-fund --registry=$NPM_REGISTRY" did not complete successfully: exit code: 137
Error: Command failed with status 1: docker buildx build --tag=docker.io/overhangio/openedx-mfe:17.0.0-nightly --output=type=docker --cache-from=type=registry,ref=docker.io/overhangio/openedx-mfe:17.0.0-nightly-cache /Users/david/Library/Application Support/tutor-nightly/env/plugins/mfe/build/mfe
The answer appears to be the parallelism situation described here. Documentation on how to fix it is now here:
https://github.com/overhangio/tutor-mfe?tab=readme-ov-file#mfe-development
Scroll down to the end of the "MFE Development" section, right before "Uninstall", and you'll find some steps to reduce the max-parallelism, which means the launch process will try to do way fewer things at once and, hopefully, succeed:
cat >buildkitd.toml <<EOF
[worker.oci]
max-parallelism = 1
EOF
docker buildx create --use --name=singlecpu --config=./buildkitd.toml
(That file can be created anywhere, see the link for more details)
Context
By palm it's expected that tutor/tutor-mfe would require BuildKit to be enabled by docker which is the case by default for docker since 23 version that is BuildKit is the default builder^1.
Buildkit adds extra features to tutor/tutor-mfe, mainly cache related, however one of it's main feature.
Would consume a lot of resources in case of tutor-mfe, given it would run
npm install
andnpm run build
concurrently for the X MFEs that are enabled by tutor-mfe, this can lead to errors related to network for former and the high resources consumption for the latter.Also another concern about this is that consider the case of which the same machine that is used to deploy an Open edX instance is used for building, it's would be quite risky a low resources machine to run build the image while also having tutor containers running. i.e. in case system crash, it would the affect the availability of the service.
Possible solution:
Note: Those are not exclusive of each others.
Rethink the way MFEs are built/deployed, i.e..
Related issue/concern:
Also in Development mode, it has been observed that typically a developer would need to work on a specific MFE, however tutor dev would by default run all MFEs in development mode, i.e.
npm run start
X times of the enabled MFEs, while is totally different issue, it's probably related.Possible outcomes at least before palm release
[^2]: Docs https://docs.docker.com/build/buildkit/configure/#max-parallelism [^3]: Slack thread https://openedx.slack.com/archives/CGE253B7V/p1684170597489729