jupyter / docker-stacks

Ready-to-run Docker images containing Jupyter applications
https://jupyter-docker-stacks.readthedocs.io
Other
8.02k stars 2.99k forks source link

Has anyone looked into using the uv package manager, yet? #2175

Open creative-resort opened 4 days ago

creative-resort commented 4 days ago

What docker image(s) is this feature applicable to?

docker-stacks-foundation

What change(s) are you proposing?

https://github.com/astral-sh/uv https://docs.astral.sh/uv/

I'm seeing Mamba / Micromamba not being able to get off the ground, not even with proper documentation – one always has to refer to the conda docs.

UV offers the package + code project handling, Python programmers are looking for, especially those working with multiple python kernel environments.

How does this affect the user?

It is not going to be a replacement for Mamba, as it doesn't support non-Python packages, but may be the right step in the direction of well appreciated future support for Python applications with Jupyter, for Python developers looking to get the benefits from uv, that they are soon going to be used to.

Anything else?

Having uv pre-installed in the stack for python projects is the idea to explore.

manics commented 4 days ago

Mamba is a re-implementation of conda, so although it would be nice for the mamba docs to be self-contained it seems OK to refer to the conda docs.

uv is available on conda-forge https://anaconda.org/conda-forge/uv

I agree that uv sounds very interesting and it's something I'm planning to try, but given it's still relatively new (announced less than a year ago) I don't think it should be installed by default yet, especially since it's trivial to install yourself.

mathbunnyru commented 4 days ago

I'm seeing Mamba / Micromamba not being able to get off the ground

What makes you say that? Mamba is quite active in development, works well and is fast (I haven't made any measurements, but much better than conda). And all that being a drop-in replacement for conda. And we were able to switch so easily.

one always has to refer to the conda docs.

That's true, but I don't think there is a much better solution.

It is not going to be a replacement for Mamba, as it doesn't support non-Python packages, but may be the right step in the direction of well appreciated future support for Python applications with Jupyter, for Python developers looking to get the benefits from uv, that they are soon going to be used to.

Unfortunately, there are some big downsides to using uv as well:

  1. If we manage our dependencies by uv, anyone who builds an image on top of ours will have to rewrite their Dockerfiles (which will be a huge breaking change).
  2. uv also doesn't support conda environments and doesn't seem to be a priority right now: https://github.com/astral-sh/uv/issues/1703.
  3. It doesn't just support conda envs, but also conda-forge, which I assume might be a huge problem for many people.

That being said, uv seems to be a cool tool, and everyone is welcome to install it on top of our images and use it.

Having uv pre-installed in the stack for python projects is the idea to explore.

Sure, let's keep this issue open for a month to at least gather some feedback.

yuvipanda commented 4 days ago

uv also has a different funding model (currently venture capital backed I think?) so that is something to be taken into consideration as part of a holistic approach too.

james-ro-williams commented 1 day ago

Hopping on this discussion with a related question/concern. How do you suggest that we lock Python package versions that we're install on top of your images? For example, if I'm building an image on top of the SciPy Dockerfile here, some Python packages are installed using mamba as expected.

However, the recommended way to add more Python packages to an image built on-top is just to use pip inside the environment that mamba creates. That's fine, but pip doesn't do its own dependency resolution and locking. Therefore, if I want a reproducible image am I supposed to handle my extra requirements myself (with a tool like Poetry), export a requirements.txt with version locking and install them that way? Doing that still leaves you open to dependency clashing with the pre-installed Python packages, so I'm at a bit of a loss.

minrk commented 1 day ago

Layered approaches like these 'stacks' are not very well suited to lockfiles, which typically assume a single install command for everything. You might have a better experience going all-in on tooling like pixi/Poetry/etc. to manage a whole env, rather than adding to an existing one, whether you start from these images or not.

But if you do want to pip install a few packages on top of one of these base images and lock just what you install, I think the way to go is to generate a constraints file from your chosen base image and include it as an input to your lock file generation.

You can generate the constraints file by running in the base environment:

pip list --format=freeze > base-env.txt

And then put your dependencies in a requirements.in and lock it with e.g.:

pip-compile -c base-env.txt

For example, this script will run pip-compile to generate a locked requirements.txt from a requirements.in and the base environment:

# lock.sh:
#!/bin/sh
set -eux

# must match FROM in Dockerfile to work correctly,
# and must be a strictly tagged base image
base_image=quay.io/jupyter/minimal-notebook:2024-07-08

# generate constraints file from base image
# then run `pip-compile -c base-env.txt` to generate locked requirements.txt
# constraining any dependencies to what's already in the base env
docker run --rm -it -v$PWD:/io -w /io $base_image sh -c "pip list --format=freeze > base-env.txt; pip install pip-tools; pip-compile -c base-env.txt -r requirements.in"
# requirements.in
altair

Then your Dockerfile uses the locked requirements.txt and the same base image tag (maybe sync in your build automation with --build-arg base_image=...):

# must always use a pinned image for locking to work
ARG base_image=quay.io/jupyter/minimal-notebook:2024-07-08
FROM $base_image
# the locked requirements.txt generated by lock.sh
COPY requirements.txt /requirements.txt
RUN pip install --no-cache -r /requirements.txt

The output of the pip install shows that it's only installing the actually missing packages, despite several dependencies also having updates available at install time:

#7 [3/3] RUN pip install --no-cache -r /requirements.txt
#7 0.717 Collecting altair==5.5.0 (from -r /requirements.txt (line 7))
#7 0.857   Downloading altair-5.5.0-py3-none-any.whl.metadata (11 kB)
#7 0.860 Requirement already satisfied: attrs==23.2.0 in /opt/conda/lib/python3.11/site-packages (from -r /requirements.txt (line 9)) (23.2.0)
#7 0.861 Requirement already satisfied: jinja2==3.1.4 in /opt/conda/lib/python3.11/site-packages (from -r /requirements.txt (line 14)) (3.1.4)
#7 0.862 Requirement already satisfied: jsonschema==4.22.0 in /opt/conda/lib/python3.11/site-packages (from -r /requirements.txt (line 18)) (4.22.0)
#7 0.863 Requirement already satisfied: jsonschema-specifications==2023.12.1 in /opt/conda/lib/python3.11/site-packages (from -r /requirements.txt (line 22)) (2023.12.1)
#7 0.863 Requirement already satisfied: markupsafe==2.1.5 in /opt/conda/lib/python3.11/site-packages (from -r /requirements.txt (line 26)) (2.1.5)
#7 0.908 Collecting narwhals==1.14.2 (from -r /requirements.txt (line 30))
#7 0.922   Downloading narwhals-1.14.2-py3-none-any.whl.metadata (7.5 kB)
#7 0.924 Requirement already satisfied: packaging==24.1 in /opt/conda/lib/python3.11/site-packages (from -r /requirements.txt (line 32)) (24.1)
#7 0.925 Requirement already satisfied: referencing==0.35.1 in /opt/conda/lib/python3.11/site-packages (from -r /requirements.txt (line 36)) (0.35.1)
#7 0.926 Requirement already satisfied: rpds-py==0.18.1 in /opt/conda/lib/python3.11/site-packages (from -r /requirements.txt (line 41)) (0.18.1)
#7 0.927 Requirement already satisfied: typing-extensions==4.12.2 in /opt/conda/lib/python3.11/site-packages (from -r /requirements.txt (line 46)) (4.12.2)
#7 0.987 Downloading altair-5.5.0-py3-none-any.whl (731 kB)
#7 1.067    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 731.2/731.2 kB 9.4 MB/s eta 0:00:00
#7 1.081 Downloading narwhals-1.14.2-py3-none-any.whl (225 kB)
#7 1.102    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 225.1/225.1 kB 12.4 MB/s eta 0:00:00
#7 1.499 Installing collected packages: narwhals, altair
#7 1.865 Successfully installed altair-5.5.0 narwhals-1.14.2
#7 DONE 2.1s

You will need to re-run lock.sh every time you update your FROM image tag to get the correct base-env.txt constraints file.

creative-resort commented 1 day ago

I'm seeing Mamba / Micromamba not being able to get off the ground

What makes you say that?

For example, a clone of an environment is supposed to be saving space by symlinking resources from the cloned (base) environment, just like with conda. That's not what is happening, though, with mamba.

There's more (at first sight) hickups, that have had unresolved issues open for years, that one stumbles upon in practice, that actually amount to roadblocks and time wasters, that one needs to figure out tediously, in order to find eventually at least workarounds for. That's why I'm saying it's still in its infancy.

james-ro-williams commented 1 day ago

Layered approaches like these 'stacks' are not very well suited to lockfiles, which typically assume a single install command for everything. You might have a better experience going all-in on tooling like pixi/Poetry/etc. to manage a whole env, rather than adding to an existing one, whether you start from these images or not.

I would be more than willing to use Poetry, I use it over the place - however if I want to get the functionality from the Jupyter images (startup scripts etc.) I'm somewhat hamstrung into using Conda and trying to make the best of it. Currently toying with a solution where I maintain a Poetry project containing my extra dependencies and creating a version locked requirements.txt from that, which I then install to the Conda environment using pip install -r generated_requirements.txt. This seems to be working, as the version locked extra dependencies will take precedence over any un-verison-locked packages already installed, so I think this is the best I can get for the moment.

manics commented 1 day ago

@james-ro-williams have you tried the foundation image? https://github.com/jupyter/docker-stacks/tree/main/images%2Fdocker-stacks-foundation It's pretty minimal but still includes support for startup scripts.

minrk commented 19 hours ago

I'm somewhat hamstrung into using Conda and trying to make the best of it.

I don't think that's the case. Python itself may have been installed by conda, but that's about it if you want to create and use an env with poetry in the image. You don't even have to use that Python, but it seems like a bit of a waste of space not to. As @manics pointed out, docker-stacks-foundation is probably the lightest image to use if you aren't going to be using the stack envs. Here's a sample of building an env entirely controlled with poetry, using only the Python installed in the foundation image: https://gist.github.com/minrk/063fe9798315738bed48df822c671615 with all the benefits of startup scripts, entrypoints, etc. in these images.

Pip's support for constraint files probably does make pip-compile the simplest choice if you want to actually build on the packages these stacks install while keeping your env locked.

But with all that said, you may well be better off creating your own images if you prefer a different set of tools, and just using these as a reference for how you might go about it. I think there is sometimes a misconception that these images are the "one true way" to run Jupyter in a container, when they are really just "one way." It really is fine and often preferable to build your own images to meet your own needs. It does not need to be as complex as these.