oxsecurity / megalinter

🦙 MegaLinter analyzes 50 languages, 22 formats, 21 tooling formats, excessive copy-pastes, spelling mistakes and security issues in your repository sources with a GitHub Action, other CI tools or locally.
https://megalinter.io
GNU Affero General Public License v3.0
1.82k stars 214 forks source link

Reduce container image pull time by supporting partial pulls #2715

Open sanmai-NL opened 1 year ago

sanmai-NL commented 1 year ago

Is your feature request related to a problem? Please describe. Pulling in MegaLinter container images can take a long time, as noted in the docs.

Describe the solution you'd like Experiment with eStargz lazy pulling.

Describe alternatives you've considered All available alternatives have been explored and somewhat implemented, like splitting out images, using stages and caching.

Additional context None.

sanmai-NL commented 1 year ago

See: https://www.redhat.com/sysadmin/faster-container-image-pulls. zstd:chunked is an alternative to eStargz.

nvuillam commented 1 year ago

I am still almost a newbie about advanced docker usage... I let the experts give their opinion ^^ cc @Kurt-von-Laven @bdovaz @echoix

echoix commented 1 year ago

I read just a little on this this weekend by following the links you provided. I agree that our kind of image would be a good candidate to that kind of optimisation. It merits a little proof of concept. Maybe the zstd chunked might have a better success (since it compresses more).

Do you know if there is any more recent activity in this field since 2021? Most articles available for eStargz seem to be released in the same time frame.

In all cases, time to support it would be worth it only if it can be used by GitHub actions (and it works better for a runner with two cores, the most common usage of MegaLinter).

Before reading I was thinking that it was another "Docker optimizing SaaS" that was thinking to dynamically remove files, and using a shallow image to pull the rest when already started. But it isn't the case, and seems to be able to be compatible with existing clients (without any advantage) since it is more of a metadata and client workaround. Both options seems to still work as OCI containers.

Either way, if you'd like to play a little with it and look at where it could be supported I'll be glad to look back to it. Or other usages of it in the wild

sanmai-NL commented 1 year ago

@echoix Advised is to use zstd:chunked now, rather than eStargz. I use it on a private GitLab, and their Container Registry seems to accept these images fine. However, Podman needs a setting that's toggled off by default to take advantage of partial pulls. Docker (dockerd) unfortunately seems to have fallen behind on this feature. Nevertheless, it is supported under e.g., Kubernetes and container engine products so users of those products and MegaLinter will still be able to take advantage.

See https://www.slideshare.net/KoheiTokunaga/starting-up-containers-super-fast-with-lazy-pulling-of-images for a (somewhat dated) overview of this topic.

echoix commented 1 year ago

I explored and made a little research during small free times this week. I learned that docker is supposed to use pigz if available when pulling to decompress layers (ie unpigz is available). I tried to see if it could be used in GitHub actions (on my fork), but didn't manage to get a conclusive result since pulling the Docker image by the runner happens before I can execute a step to install pigz. When using composite actions, I don't seem to be able to specify the correct usage to the docker image, and running it manually seems to be missing context, so it seems I didn't feed the full command line and all the features GitHub actions seem to provide, so it maybe not be a good idea for general usage. If you want to test, on GitHub actions, updating apt and installing pigz takes 9 seconds, and the baseline for pulling the megalinter python beta flavor took 43-45 seconds, so the time to beat if pigz ever helped, was to reduce by more than 9 seconds. I think I read in issues and discussions that some people had like 18-23% improvement with multiple cores but I can't find back the source now :(. In that case of the Python flavor, it's no gain (43x0.23=9.89 sec) if it ever works. And decompression in Docker stays serial between layers, but a layer can be decompressed in parallel. In summary, configuration of the Docker used by a GitHub hosted runner for an action doesn't seem easily feasible.

Next, I looked at the status of zstd compression. It could be advantageous to have the size of the download and the speed of Other container software implementations for OCI (other than Docker) seem to have it figured out for a while. For Docker, support for pulling images where layers are compressed as zstd is finally available in version 23.0.0 (released on 2023-02-01 https://docs.docker.com/engine/release-notes/23.0/#new). (Version 24.0.0 was already released on 2023-05-16). But, it is not installed in the Ubuntu 22.04 of the GitHub hosted runners yet. They use Docker-Moby Client 20.10.25+azure-2 and Docker-Moby Server 20.10.25+azure-2 (https://github.com/actions/runner-images/blob/2ccef4f1005354274efa8c006927aec8601150d4/images/linux/Ubuntu2204-Readme.md#tools). A recent discussion talked about this: https://github.com/actions/runner-images/discussions/7770 If the path would be clear, my idea was to use zstd-compressed layers as https://aws.amazon.com/fr/blogs/containers/reducing-aws-fargate-startup-times-with-zstd-compressed-container-images/ or https://docs.docker.com/build/exporters/#compression. That means we would be a fully OCI image that doesn't seem to be a problem on itself, we are playing with it for multiplatform image support and being able to load them locally. But we saw that at least GitHub runners wouldn't work for now, and even if they would, we would need to require recent versions of Docker or any other container runner for everyone, checking many platforms. Side note, that issue seem to indicate that on macOS, the Docker doesn't support zstd yet. I don't have a macOS computer available to check that affirmation, nor a M1/M2 based one to further check (as we know we have users using it locally on these platforms). But if there isn't any reasons that macOS is different, it should've worked with version 23.0.0, released in February before the issue (of the end of April), so unexpected.

I saw a lot of promising tricks that could be used by the usage of the containerd container store if the transition of Docker continues as it is going. But now it's beta/experimental and doesn't work by default, and I don't think we can expect it from our users yet.

So overall, I'm now aware of what exists, but we might just be a little too early for a mass usage.

nvuillam commented 1 year ago

Many thanks @echoix for your great analysis :)

echoix commented 1 year ago

I think we could close for now, and ping us back in another issue if ever the ecosystem changes and we don't get a word of it yet. But at least for now I know to keep a look at this optimisation in the future. Your point has been made.

sanmai-NL commented 1 year ago

@echoix I'm mostly focused on user value in my proposal. If building and pushing a zstd chunked compressed image is possible, the rest doesn't matter so much, does it? If needed at all, it'd be easily possible to run Podman in CI without degrading Docker Engine related GitHub Actions, isn't it? See e.g., https://github.com/redhat-actions/push-to-registry

echoix commented 1 year ago

Building and pushing such image isn't a problem, I think it's pretty possible to do so. But it wouldn't be consumable by the main utilisation source, GitHub actions. At least for now.

sanmai-NL commented 1 year ago

Oh, we use MegaLinter on GitLab CI.

By the way, you can run Podman from a container. Should be possible under GitHub Actions, too, no?

github-actions[bot] commented 11 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

If you think this issue should stay open, please remove the O: stale 🤖 label or comment on the issue.

mdrocan commented 11 months ago

A slim-image like the super-linter has could be useful. Of course it would mean making compromises with what is included to the image and what not.

sanmai-NL commented 11 months ago

Not stale.

echoix commented 11 months ago

A slim-image like the super-linter has could be useful. Of course it would mean making compromises with what is included to the image and what not.

There are already multiple flavors for that, including ci_light and cupcake (a medium-small for most commonly used languages). There are flavors for projects in many different ecosystems (ex: Python, documentation, security, JavaScript, go, rust, etc).

If you use the full image at first, and a flavor suits your needs for the files found in your repo, it will suggest you to switch to that flavour, or will pre-fill out an issue for you to suggest a new flavor with the linter types.

mdrocan commented 11 months ago

Cool. Yes noticed the suggestions, but didn't have time to check them yet.

nvuillam commented 11 months ago

@mdrocan you can pick your choice here :) https://megalinter.io/latest/flavors/

mdrocan commented 11 months ago

@nvuillam Yeah I just quickly checked those, have to try some of them out with some software :)

mdrocan commented 10 months ago

And tested with couple of images for different projects those seem to work nicely (also reduce the execution time plenty). In comparison with super-linter's slim version and the different images I tested, I think I know how to continue for now ;)

nvuillam commented 10 months ago

@mdrocan i'm glad the flavor fits your requirement :)

@sanmai-NL > I'm open to build additional images in different format and push them in other registries, as it would not impact the current architecture, would you like to try a PR ?

mdrocan commented 10 months ago

Yeah, it/they work, but noticed an issue with Ansible. Most likely need to create a bug for it once I have time to test it still couple of times.

nvuillam commented 10 months ago

Ansible-lint behaviour should be the same in all flavors, maybe it is an ansible-lint issue ?

sanmai-NL commented 10 months ago

See: https://github.com/awslabs/soci-snapshotter. A way to reduce startup time without rebuilding images.

Kurt-von-Laven commented 10 months ago

If you want to test, on GitHub actions, updating apt and installing pigz takes 9 seconds

@echoix, was this performance in the case of a cache hit or miss?

ubuntu-22.04 now ships with the latest Docker Client/Server 24.0.6, which supports zstd decompression. To your point about other platforms, my impression is that many MegaLinter users use Azure DevOps.

Side note, that issue seem to indicate that on macOS, the Docker doesn't support zstd yet.

I can't find the mention about zstd decompression failing on macOS. Were you referring to actions/runner-images#7770?

echoix commented 10 months ago

If you want to test, on GitHub actions, updating apt and installing pigz takes 9 seconds

@echoix, was this performance in the case of a cache hit or miss?

Euhm, that was to run apt get update and apt get install to install pigz. However, I didn't manage at that time to find a combination of action definition that would allow me to install pigz before pulling the docker image. Since the action definition specifies that a image will be pulled, it pulls it before starting, so I didn't manage to influence the run environment of the action.

ubuntu-22.04 now ships with the latest Docker Client/Server 24.0.6, which supports zstd decompression. To your point about other platforms, my impression is that many MegaLinter users use Azure DevOps.

That's interesting, maybe time to take a new look at the situation, since it's quite a jump from 20.10.25 to 24.0.6. (In the beginning of August 2023/end of July 2023, a runner image with version 23.0.6 was released in between).

Side note, that issue seem to indicate that on macOS, the Docker doesn't support zstd yet.

I can't find the mention about zstd decompression failing on macOS. Were you referring to actions/runner-images#7770?

The related issue mentioning that 23.0.6 was already included, might mean that it's time to try again, maybe everything is there now. (I don't remember allllll the prerequisites/interdependencies by heart)

Kurt-von-Laven commented 10 months ago

Euhm, that was to run apt get update and apt get install to install pigz. However, I didn't manage at that time to find a combination of action definition that would allow me to install pigz before pulling the docker image. Since the action definition specifies that a image will be pulled, it pulls it before starting, so I didn't manage to influence the run environment of the action.

That sounds like the cache miss case then since it sounds like the apt updates and pigz weren't being cached. It probably isn't relevant now, but the easiest approach to testing pigz's performance would be to run MegaLinter via mega-linter-runner or as a pre-commit hook since neither approach pulls down the Docker image in a pre-step.

That's interesting, maybe time to take a new look at the situation, since it's quite a jump from 20.10.25 to 24.0.6. (In the beginning of August 2023/end of July 2023, a runner image with version 23.0.6 was released in between).

Agreed!

Kurt-von-Laven commented 10 months ago

Also, actions/runner-images#8205 recently added pigz as a top-level dependency of the GitHub Actions hosted Ubuntu runner images, however according to actions/runner-images#8161, it was previously present as a recommended package of Docker.

echoix commented 10 months ago

Oh well, that was a very specific issue/PR! Never thought of looking at that repo's issues for that.

github-actions[bot] commented 8 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

If you think this issue should stay open, please remove the O: stale 🤖 label or comment on the issue.

sanmai-NL commented 8 months ago

/remove "O: stale"