overhangio / tutor-mfe

This plugin makes it possible to easily add micro frontend (MFE) applications on top of an Open edX platform that runs with Tutor.
GNU Affero General Public License v3.0
22 stars 95 forks source link

Document how to use BuildKit options to reduce resources consumption #125

Closed ghassanmas closed 1 year ago

ghassanmas commented 1 year ago

Context

By palm it's expected that tutor/tutor-mfe would require BuildKit to be enabled by docker which is the case by default for docker since 23 version that is BuildKit is the default builder^1.

Buildkit adds extra features to tutor/tutor-mfe, mainly cache related, however one of it's main feature.

Parallelize building independent build stages ^1.

Would consume a lot of resources in case of tutor-mfe, given it would run npm install and npm run build concurrently for the X MFEs that are enabled by tutor-mfe, this can lead to errors related to network for former and the high resources consumption for the latter.

Also another concern about this is that consider the case of which the same machine that is used to deploy an Open edX instance is used for building, it's would be quite risky a low resources machine to run build the image while also having tutor containers running. i.e. in case system crash, it would the affect the availability of the service.

Possible solution:

Note: Those are not exclusive of each others.

  1. Configure BuildKit to use less resources as suggested by @regisb [^2] [^3]
  2. Make it optimal to use the BuildKit builder just when building tutor-mfe.
  3. Rethink the way MFEs are built/deployed, i.e..

    • Use External service to build each MFE separatly. For example this can be a path openedx/wg-devops/issues/14

    Related issue/concern:

Also in Development mode, it has been observed that typically a developer would need to work on a specific MFE, however tutor dev would by default run all MFEs in development mode, i.e. npm run start X times of the enabled MFEs, while is totally different issue, it's probably related.

Possible outcomes at least before palm release

[^2]: Docs https://docs.docker.com/build/buildkit/configure/#max-parallelism [^3]: Slack thread https://openedx.slack.com/archives/CGE253B7V/p1684170597489729

regisb commented 1 year ago

Can you please try the following solution?

  1. Create a buildtkit.toml configuration file with the following contents:
[worker.oci]
   max-parallelism = 2
  1. Create a builder that uses that configuration:
docker buildx create --use --name=max2cpu --driver=docker-container --config=./buildkit.toml
  1. Build the mfe image:

    tutor images build mfe

This solution seems to work for me, in the sense that I see only two layers built simultaneously.

If this works for you as well, then I think that it should be the recommended approach, and we should add these instructions to the troubleshooting docs.

ghassanmas commented 1 year ago

Yes @regisb, it did what is expected, I can it only does two thing at a time.

image
arbrandes commented 1 year ago

I second @regisb's proposal. A nice and simple way to reduce resource usage.

ghassanmas commented 1 year ago

[Update]

I had to change it to use one worker, when testing on Mac M2 mini with 8GB of RAM:

while building, running docker stats

CONTAINER ID   NAME                              CPU %     MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O         PIDS
8c34f988213e   buildx_buildkit_max2cpu           170.35%   868.6MiB / 7.765GiB   10.92%    470MB / 8.78MB    21.8MB / 3.01GB   46

CPU would fluctate between 100-200%, I/O aroud 800MB, PIDs can reach up to to 60. That when using one worker.

The crash error I would get otherwise, npm killed something.

To use one worker, I had to update the file pointed above and then running this command:

docker buildx create --use --node=max2cpu --driver=docker-container --config=./buildkit.toml the difference is just using --node because name exists.

ghassanmas commented 1 year ago

Also after the build is done docker stats

CONTAINER ID   NAME                              CPU %     MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O         PIDS
8c34f988213e   buildx_buildkit_max2cpu           0.00%     1.569GiB / 7.765GiB   20.21%    811MB / 13.8MB    1.33GB / 8.45GB   25

Why is it still running, it make sense that CPU is 0% because build is done, however it still consumes a lot of RAM.

I am not sure what magic does buildx/builder do, but I had to stop it docker stop 8c34f988213e

The builder would initially be inactive docker buildx ls and would not appear show up in docker ps or docker stats as a running container, until a build command is execuated. The probelm again is even after build is done, the builder container would still be running... may be I had to wait for it to stop itself, I couldn't find a relavnt ref in the doc.

regisb commented 1 year ago

I had to change it to use one worker, when testing on Mac M2 mini with 8GB of RAM

Running with just two workers exceeds your 8GB of RAM??? This would mean that building a single MFE requires 4GB of RAM? If this is true then we really need to rename MFE to macrofrontends.

Why is it still running, it make sense that CPU is 0% because build is done, however it still consumes a lot of RAM.

Buildx is actually a process that runs inside a docker container -- as implied by the --driver=docker-container option you used. I suspect it's using some memory because it's doing garbage collection and other chores in the background. In my experience it's safe to remove the container, as it will automatically be restarted next time you run docker buildx.

arbrandes commented 1 year ago

we really need to rename MFE to macrofrontends.

No objections from me. ;P

regisb commented 1 year ago

I changed the title of the issue to reflect the decision proposed in my earlier comment.

davidjoy commented 8 months ago

For those that end up at this PR trying to solve the following error when running tutor dev launch:

ERROR: failed to solve: process "/bin/sh -c npm clean-install --no-audit --no-fund --registry=$NPM_REGISTRY" did not complete successfully: exit code: 137
Error: Command failed with status 1: docker buildx build --tag=docker.io/overhangio/openedx-mfe:17.0.0-nightly --output=type=docker --cache-from=type=registry,ref=docker.io/overhangio/openedx-mfe:17.0.0-nightly-cache /Users/david/Library/Application Support/tutor-nightly/env/plugins/mfe/build/mfe

The answer appears to be the parallelism situation described here. Documentation on how to fix it is now here:

https://github.com/overhangio/tutor-mfe?tab=readme-ov-file#mfe-development

Scroll down to the end of the "MFE Development" section, right before "Uninstall", and you'll find some steps to reduce the max-parallelism, which means the launch process will try to do way fewer things at once and, hopefully, succeed:

cat >buildkitd.toml <<EOF
[worker.oci]
  max-parallelism = 1
EOF
docker buildx create --use --name=singlecpu --config=./buildkitd.toml

(That file can be created anywhere, see the link for more details)