osrf / docker_images

A repository to hold definitions of docker images maintained by OSRF
Apache License 2.0
527 stars 168 forks source link

Discussion on Github Actions, Packages, and Buildah #723

Open sloretz opened 4 months ago

sloretz commented 4 months ago

Given the effort that's been made to try to solve https://github.com/osrf/docker_images/issues/112 , I looked at creating images with Github Actions using buildah and hosting them on Github Packages. It seems like a pretty good option to me, so I'm opening a ticket to discuss if the official images could benefit from any of it.

https://github.com/sloretz/ros_oci_images

About Github Packages I learned:

About buildah I learned:

sloretz commented 4 months ago

Oh, and how it solves #112: A github action checks every 6 hours if the desktop-full variant for a ROS distro could be updated. If so it rebuilds all the images for that distro. I also added a job that runs once a week to rebuild all images to catch non-ROS package updates.

mikaelarguedas commented 4 months ago

Thanks @sloretz !

To clarify in 2 lines, the suggestion here is:

Is that correct?

(for the following I'm speaking under @ruffsl and @sloretz control, do not hesitate to clarify / correct the statements below)


Current state of things:


Pros of the approach suggested in this ticket:

Cons:

sloretz commented 4 months ago

To clarify in 2 lines, the suggestion here is: [...] Is that correct?

I'd say the one line summary is "look at this cool thing I made", but more concretely I think some useful changes here would be:

ruffsl commented 3 months ago

To move away from dockerhub and the official docker library completely

This would be a major change to our infrastructure, not to mention perhaps a bit disruptive or confusing for new and old end users alike. I'm not totally for, nor against, but it would be a serious undertaking regardless.


Logistically for infrastructure,

OSRF's github org would probably require significantly more resource credits to perpetually build and host all official ros docker images. E.g. GitHub Action credits for running CI jobs and GitHub container registry pull counts. Currently, a lot of this infrastructure overhead we offload to DockerHub, who operate the multi architecture docker engines to build our images (for every supported platform), while also hosting the docker image registry (with enough bandwidth and unrestricted aggregate public pull count quotas) for the image repo.

Even for unofficial images under OSRF's own DockerHub org, we still currently benefit from being enrolled in Docker's Sponsored Open Source Program, extending similar less restricting pull quotas for our GUI based images as well. @tfoote can speak to the details on the time and effort it took to get that all initiated and finalized.

I'm guessing GitHub has similar sponsorship programs for open source projects with larger resource usage requirements, but we'd probably want to check into it and get that ball rolling quickly if we decide to migrate, lest we just as quickly get rate limited when building or hosting our ROS images using only GitHub.


From a user perspective,

DockerHub has the historic (perhaps controversial) benefit of being the default image registry for most out of the box container tools. As such, switching registries would of course require users to update all Dockerfile directives, build ARGs, build scripts, CLI muscle memory, etc, to include the added domain name of GitHub's own container registry URI. For example, common approaches such as this would have to be updated everywhere:

ARG ROS_DISTRO=rolling
FROM ros$ROS_DISTRO

ros is not an official docker image anymore (-> impact?) what to do with all images on dockerhub

That is to say, to make migration simple, we'd probably want to mirror all past ROS images on GitHub's container registry as well. But then to avoid breaking legacy setups, we'd have to leave up the ROS images already published on DockerHub's official image library, not that Docker Hub librarians would let maintainers yank archived images anyhow. Yet this duplication would probably cause a lot of confusion as image tags fall out of sync, regardless of public announcements or deprecation notices posted, given how most folks use official images via the CLI and rarely return to check a repo's webpage for a library image.

ruffsl commented 3 months ago

I looked at creating images with Github Actions using buildah and hosting them on Github Packages.

If switching to Github Actions anyway, what would be the difference between using buildah vs. Github's official "Build and push Docker images" action? Is it the aspect of building images via CLI and scripts? Any advantage over compared to buildkit's native CLI or full python SDK? Although, I don't think the python SKD has full support yet for buildkit.

For example, here is the github action I wrote to efficiently rebuild the Nav2 CI image using buildkit and image layer caching:

Where every day it checks the image to see if any ros packages are updatable:

ruffsl commented 3 months ago

Change template system to use Dockerfiles with build arguments. There would be some repetition, but I think it would make the images easier to contribute to.

A reason we haven't yet adopted the use of build ARGs is of limits for the official images' review pipelines, or how they don't (didn't?) support variable substitution.

Another reason is probably the question of how to set non-default build ARG values via the docker library manifest, which expects a single/static Dockerfile, or AFAIK...

On that last point, hypothetically, if we did migrate away from Docker Hub's official library, we could probably make great use of modern multi-stage builds, where each ROS meta package tag could be in-lined as separate multi stages in a single Dockerfile. This would avoid the need of complex makefiles, and cut down on the number of Dockerfiles and folders.

This is because I think Docker Hub's official image library Instruction Format doesn't yet allow for the added specification of a --target stage when mapping tags to Dockerfiles.

ruffsl commented 3 months ago

Instead, how much more difficult would it be to just bring ROS's packaging versioning scheme closer to something more easily cacheable, yet busted after periodic syncs? E.g denoting the sync as a first class marker in the package version string:

Package: ros-rolling-ros-core
-Version: 0.10.0-2jammy.20240216.184241
+Version: 0.10.0-2jammy.sync-42.20240216.184241

Given the points maid in this comment, I guess this would still bust the build cache for all architectures simultaneously, but I suppose I'd still trade that off to avoid having to hardcode more into Dockerfiles, or maintain more machinery:

sloretz commented 3 months ago

Thanks for the thorough reply Ruffin!

To move away from dockerhub and the official docker library completely This would be a major change to our infrastructure, not to mention perhaps a bit disruptive or confusing [...]

Agreed. I think at most I would recommend hosting images on both Dockerhub and Github Packages - at least for ROS Distros that already exist on Dockerhub.

OSRF's github org would probably require significantly more resource credits to perpetually build and host all official ros docker images. E.g. GitHub Action credits for running CI jobs and GitHub container registry pull counts.

I wondered about that when I started, but at least on my personal account it seems both storage and data transfer are free. Who knows if Github will keep being this generous forever though.

If switching to Github Actions anyway, what would be the difference between using buildah vs. Github's official "Build and push Docker images" action? Is it the aspect of building images via CLI and scripts?

Building images via the CLI is the main thing I was looking for. When making github actions I've often found it easier to do most of the logic in a script where I can iterate locally.

Any advantage over compared to buildkit's native CLI or full python SDK?

I liked the experience using buildah because it makes it easy to create images without being root, but I don't know of any features buildah has that buildx lacks. I looked briefly at a few Python APIs, but I don't remember why I decided not to use any of them.

Another reason is probably the question of how to set non-default build ARG values via the docker library manifest, which expects a single/static Dockerfile, or AFAIK...

The static Dockerfile requirement does seem like a significant limitation on images can be generated here. I really like how the build argument FROM statement turned out. There's some duplication between ROS 1 and 2, but overall not that much.

Maybe for the purposes of the Docker library we could reduce the Dockerfile generation to replacing the ARG and FROM lines with a static string?

Instead, how much more difficult would it be to just bring ROS's packaging versioning scheme closer to something more easily cacheable, yet busted after periodic syncs?

Ah this is a tough one. The package versions are decided when the package is built, which is long before the sync happens. IIUC the sync just copies the packages from the testing apt repo into the main apt repo.