jupyter / docker-stacks

Ready-to-run Docker images containing Jupyter applications
https://jupyter-docker-stacks.readthedocs.io
Other
8.02k stars 2.99k forks source link

Build foundation, base, minimal image variants for different Ubuntu/Python versions #2139

Closed mj0nez closed 2 months ago

mj0nez commented 3 months ago

What docker image(s) is this feature applicable to?

base-notebook, docker-stacks-foundation, minimal-notebook

What change(s) are you proposing?

Hi,

I would like to propose the addition of new variants for the images docker-stacks-foundation, base-notebook and minimal-notebook.

How does this affect the user?

All images are currently pinned to one Python version (3.11 with an open PR #2072 to upgrade to 3.12). The problem is that this forces downstream consumers to follow the project’s Python version rather than choosing a version on their own. Although Jupyterlab supports all current Python versions, the project’s Python version is restricted by the supported version of all dependencies accross the whole stack (scipy, pytorch, datascience, pyspark, allspark).

Providing more variants for these “base images” would loosen upstream restrictions and allow more teams to use the stack while managing their own dependencies. While this will definitely introduce some changes to the pipelines, increase build time and registry storage, I believe allowing a wider adoption of this stack is worth it.

Kind regards

Anything else?

No response

mathbunnyru commented 3 months ago

I have thought about this a lot in the past, and let me tell you my thoughts:

  1. The problem is that this forces downstream consumers to follow the project’s Python version rather than choosing a version on their own.

    I partially agree with this statement - we have a nice tagging system, and never delete old images, so users can choose the older Python versions they need. They will get old packages though. With the new Python version it's more difficult though - when do we decide to start building images for it? Right now we have quite a simple strategy "When a new version is supported by all the libraries we use". What if some images support new Python, while others don't? What if works differently for aarch64 and x86_64?

  2. The Python version is one of the most important things that our images have. But so is the Ubuntu version, and maybe Jupyter Notebook and Jupyter Lab major versions are also quite important. I think if we start building for all possible versions and combinations, then we're gonna have many new problems - more builds will fail (dependency management is difficult, and GitHub also fails more often than desired).
  3. We will need to think about how we tag our images, it's not going to be as straightforward.
  4. We don't have the computer power to build many aarch64 images without sacrificing build time. This will be less of a problem when GitHub aarch64 Linux runners are out of beta (right now we use a small number of self-hosted aarch64 runners).
  5. I think there will be much more maintenance burden to keep up with what to build and if something builds at all.

Please note, that I'm not saying this is a bad idea, but I want to underline all the issues we will probably have with this approach. This change will require lots of effort not in just making these decisions, but also in rewriting our github workflows and making sure they work fine not just when everything works fine, but when some random build fails and we need a restart.

Few more thoughts on how one can use our images in some specific cases:

  1. Most of the time our images are quite good 'as is'. But it's fine and encouraged to build on top of our images, install other packages, and update/downgrade existing ones. Nothing wrong with it, and in my opinion, the adoption of our images is great, if we consider these use cases.
  2. We even have a project to configure your own pipeline for the custom image: https://jupyter-docker-stacks.readthedocs.io/en/latest/contributing/stacks.html
  3. I've seen some people fork this repo and change a few things they don't like and it works quite well - and this is also something I keep in mind when accepting new changes.

Update: now we have an example how to use docker bake to build a custom set of images easily: https://jupyter-docker-stacks.readthedocs.io/en/latest/using/recipes.html#building-stack-images-with-custom-arguments

Hope this helps.

mathbunnyru commented 2 months ago

I updated the issue name to make it slightly more general. I would like to gather some feedback from our users if this feature is actually worth investing lots of time - if this issue gets commented a lot like "I would love to use an image with Ubuntu 22 with Python 3.10" for example, then it would be a good reason to implement this. If not - maybe our images already work in most cases.

minrk commented 2 months ago

Adding any axis leads to an explosion of build times and images, so I think it's appropriate to keep this as limited as possible. Making it easier for folks to do their own builds is part of relieving the pressure on that. So I don't think allowing the base distro image to vary is worth that cost. Supporting more than one Python version may be, though, but I'd keep it quite limited (maybe not more than 2, drop one when adding the next, etc.).

So I'd weigh how easy is it for us to build Python version variants against how easy is it for someone who wants a different base stack to build their own, and emphasize this in the docs.

consideRatio commented 2 months ago

To just continuously building one recent version of ubuntu, python, R, julia, and let users be able to use tags of old versions that no longer get rebuilt captures a lot of what users benefit from I think.

I see some value of building multiple versions (say Python 3.11 + Python 3.12), as it can allow a user to stay back in Python version a while also staying updated with pre-installed software. This could provide some breathing room to transition -- but I think users must transition no matter what, so providing more than two versions of Python seems far too much.

Overall I think it isn't worth the maintenance complexity of adding multiple versions of either ubuntu, python, r, julia.

mathbunnyru commented 2 months ago

I made some documentation improvements in https://github.com/jupyter/docker-stacks/pull/2144

manics commented 2 months ago

If there's significant demand for other combinations of Python or Ubuntu versions I think it should be done in a seperate repository following the suggestions in https://github.com/jupyter/docker-stacks/pull/2144

I don't think the added complexity of doing it in this repository is worth it. There's almost no overlap in the images, and therefore no benefit in using the same tags.

mathbunnyru commented 2 months ago

Thanks for all the ideas.

I updated the docs and they now better show how one can build a custom set of images: https://jupyter-docker-stacks.readthedocs.io/en/latest/using/custom-images.html

I would like to keep this issue open for a month - if we receive lots of requests from users, then we might have to reconsider. If not, I will close the issue.

mathbunnyru commented 2 months ago

This issue has been open for a month now and the situation has been improved (docker bake + docs restructuring), and I don't see any complaints about the current state, so I will close the issue.

Feel free to share your opinion, though, if this is not good enough for someone.