jupyterhub / team-compass

A repository for team interaction, syncing, and handling meeting notes across the JupyterHub ecosystem.
http://jupyterhub-team-compass.readthedocs.io
62 stars 33 forks source link

build container images from a different repo #474

Open minrk opened 5 years ago

minrk commented 5 years ago

We currently build the jupyterhub/jupyterhub and jupyterhub/singleuser images from this repo (edit: this repo is jupyterhub/jupyterhub, the issue was transferred to jupyterhub/team-compass). That's a little funky, because it means building stable versions (1.0.0) and master from the repo tracking master. It makes it hard to separate the code for building images from the source of the application.

I think we should make a dedicated repo for building the jupyterhub docker images. This would make it easier to have variants, e.g.

minrk commented 2 years ago

Moving this to team-compass for discussion, since it's really an organizational issue. I don't think we should build docker images from package repos themselves. I think we should have a dedicated repo for building our collection of docker images.

I think there are problems associated with building images from the repos containing the packages themselves. In particular, there's no mechanism to produce security updates (or any other kind of updates) for a versioned jupyterhub image once a tag has been published.

In general, I think it probably makes sense to have a docker-images repo containing whatever images we want to maintain with CI for building them from 'supported' release tags (e.g. latest 1.x, prerelease, and main). This will make it much easier to maintain variations (e.g. alpine, demo) without requiring a jupyterhub release to publish them, keep up with security patches in the base environment, etc.

Then we should remove all image-building from the jupyterhub/jupyterhub repo, and any other repo that contains a package and its own image, if there are any (helm chart repos not included).

manics commented 2 years ago

I think building container images from a separate repo makes sense for stable projects.

How do you envisage the single repo with multiple independently released images working- do you already know of an example we can copy?

minrk commented 2 years ago

Maybe it should be multiple repos! All of the images currently built in jupyterhub/jupyterhub should have the same tags at the same time, though, so it makes sense to me to bundle them, I think. The main thing is that publishing tagged images should not be triggered by pushing tags on repos, they should be managed manually in workflow variables.

I was thinking one repo would make sense for organizational purposes, and maybe separate workflows for collections of images?

manics commented 2 years ago

The main thing is that publishing tagged images should not be triggered by pushing tags on repos, they should be managed manually in workflow variables.

OK, that's the key point that makes a single repo feasible :smiley:

I can think of two high-level ways of structuring the repo:

Everything in the same repo

The Dockerfiles and build pipelines are fully self-contained- the only interaction they have with released artifacts is through Pypi or NPM, they never reference the source GitHub repo. Easy to see all images, and provides a clean separation, but if there are significant changes in the upstream repo (e.g. a new dependency) you need to coordinate changes across multiple repos.

Triggers and common config in the same repo, build details in the source repo

The single repo manages the triggers and common config, but delegates the building to the separate repos. For example, perhaps the jupyterhub job would take the container tag 2.0.0-1 and the jupyterhub repo tag/commit you want to build, but it wouldn't know there are 3 or 4 separate images- if that's handled in the jupyterhub/jupyterhub repo you can easily add/remove an image, and it's easier to release a backport.

As an aside, I lean towards container-images instead of docker-images.

minrk commented 2 years ago

the only interaction they have with released artifacts is through PyPI or NPM

Yes, that's what I'm thinking. Unless we also support a dev build, but that's again only different by a https://github.com/jupyterhub/jupyterhub/archive/HEAD.zip vs jupyterhub==1.2.3. Nothing else is needed.

if there are significant changes in the upstream repo (e.g. a new dependency) you need to coordinate changes across multiple repos.

I'm not quite sure what that means. In this approach, dependencies are fully encapsulated in the packages themselves, so already expressed via the pip/npm install. The image-building repo would not need any changes if the upstream package changed its dependencies. Plus, that would only propagate to the image-building repo on a version update, which will always be explicit.

This does make me think it would be good to maintain a requirements.in/txt with pip-compile for these images.

As an aside, I lean towards container-images instead of docker-images.

Reasonable! I'm also fine with just images.

Let's maybe try a single repo for now, and see if anything gets in the way.

Are there any other images being built from a package repo? I think it's just the 3 images in jupyterhub/jupyterhub for now.

minrk commented 2 years ago

I've been thinking about this on and off, and it seems hard to design a repo for building multiple images that automatically builds the images when you want to, but doesn't rebuild all the images too often.

One repo per image also seems like a pain, because there would be so much shared infrastructure (e.g. jupyterhub version bump requiring updates to every repo).

Maybe this is tractable with chartpress-like detect-changes-in-subdirectory logic, but I don't want it to get too complicated.

As an aside, I lean towards container-images instead of docker-images.

FWIW, as soon as I started playing around and realized there would be forks on contributors' orgs, jupyterhub-images seemed like the right thing, because what is minrk/container-images?

manics commented 2 years ago

All good points!

In previous orgs I've used one repo per container build, but with shared infrastructure in a common repo. For instance, run git clone <github>build-scripts; ./build-scripts/build.sh, or something using a shared GitHub action, that aren't pinned to a fixed version. That way updating the build-scripts means all repos are updated when they next run.

I've also worked with using a parent repo with all the other repos as submodules.

Have you started work on this yet? If not I can try and create an example.

minrk commented 2 years ago

Have you started work on this yet?

Not to any useful degree, just a little messing around and copying files.

I can try and create an example.

That would be great!

manics commented 2 years ago

I wrote a notebook to find all Dockerfiles in the JupyterHub org repos:

Ones to consider:

configurable-http-proxy:

docker-image-cleaner:

jupyterhub:

repo2docker:

Can probably ignore the rest, though some of these are out of date so could maybe switch to depend on latest instead of an out of date pinned version of an upstream image?

binder:

binderhub:

chartpress:

dockerspawner:

jupyter-remote-desktop-proxy:

jupyterhub-deploy-docker:

jupyterhub-example-kerberos:

jupyterhub-on-hadoop:

kerberosauthenticator:

mybinder.org-deploy:

oauthenticator:

repo2docker-action:

sudospawner:

the-littlest-jupyterhub:

zero-to-jupyterhub-k8s:

Overall it doesn't look too bad as only JupyterHub has multiple images in one repo, and AFAICT we don't publish many Docker images outside chartpress.

minrk commented 2 years ago

That's a great overview! I think we can start with just jupyterhub. The main problem I want to solve is that the release cycle of a package doesn't match the appropriate update cycle of images. This mainly applies to jupyterhub and configurable-http-proxy. repo2docker is another package-self image, but the issue doesn't come up so much because we publish that one continuously (I haven't ever wanted to go back and update a tagged repo2docker image). I think there's also a lot less pressure on CHP, since it's so single purpose, but the same principle does technically apply. I think we can wait on that one.

jupyterhub is the main image where I think it makes sense to publish several flavors:

and those all have reasonable expectations of changes not tied to the releases of JupyterHub itself.