Open matthewfeickert opened 12 months ago
We already have lock files for pinning the base requirements, though these aren't yaml files: https://github.com/jupyterhub/repo2docker/tree/main/repo2docker/buildpacks/conda Is this a different type of lock file?
@manics This might be a conda-lock
version issue. The conda-lock
format was unified in conda-lock
v1.0.0
(c.f. https://github.com/conda/conda-lock/pull/124)
By default,
conda-lock
store its output inconda-lock.yml
in the current working directory. This file will also be used by default for render, install, and update operations. You can supply a different filename with e.g.conda-lock --lockfile superspecial.conda-lock.yml
It seems though, that yes, the format of what you have is different. Example:
compared to something like https://iris-hep.org/analysis-systems-env-nightlies/iris-hep-rc/3.11/conda-lock.yml
# This lock file was generated by conda-lock (https://github.com/conda/conda-lock). DO NOT EDIT!
#
# A "lock file" contains a concrete list of package versions (with checksums) to be installed. Unlike
# e.g. `conda env create`, the resulting environment will not change as new package versions become
# available, unless you explicitly update the lock file.
#
# Install this environment as "YOURENV" with:
# conda-lock install -n YOURENV --file conda-lock.yml
# To update a single package to the latest version compatible with the version constraints in the source:
# conda-lock lock --lockfile conda-lock.yml --update PACKAGE
# To re-solve the entire environment, e.g. after changing a version constraint in the source file:
# conda-lock -f iris-hep-rc/3.11/environment.yml --lockfile conda-lock.yml
version: 1
metadata:
content_hash:
linux-64: e002febb8b04300e80dded8f2b7dabb269ace11a83f98db20719007774f0f52c
channels:
- url: conda-forge
used_env_vars: []
platforms:
- linux-64
sources:
- iris-hep-rc/3.11/environment.yml
package:
- name: _libgcc_mutex
version: '0.1'
manager: conda
platform: linux-64
dependencies: {}
url: https://conda.anaconda.org/conda-forge/linux-64/_libgcc_mutex-0.1-conda_forge.tar.bz2
hash:
md5: d7c89558ba9fa0495403155b64376d81
sha256: fe51de6107f9edc7aa4f786a70f4a883943bc9d39b3bb7307c04c41410990726
category: main
optional: false
- name: ca-certificates
version: 2023.7.22
manager: conda
platform: linux-64
dependencies: {}
url: https://conda.anaconda.org/conda-forge/linux-64/ca-certificates-2023.7.22-hbcca054_0.conda
hash:
md5: a73ecd2988327ad4c8f2c331482917f2
sha256: 525b7b6b5135b952ec1808de84e5eca57c7c7ff144e29ef3e96ae4040ff432c1
category: main
optional: false
...
(edit)
Ah yes, here we go:
Pre 1.0 compatible usage (explicit per platform locks)
If you were making use of conda-lock before the 1.0 release that added unified lockfiles you can still get that behaviour by making use of the
explicit
output kind.conda-lock --kind explicit -f environment.yml
So it seems that you're using the pre-v1.0
explicit lock file format over the v1.0+
unified lockfile.
Supporting conda-lock outputs would be... nice, but would likely require some guardrails, and blessing some "r2d knows best" conventions.
conda-lock
itself hauls in... a lot of dependencies, so may not be a good candidate for the "base coat" environment. micromamba
, already present, is certainly up to the task of consuming both formats... though pixi
very well might end up "winning" for this use case.
As, ideally, it would replace (not change) the notebook
environment, supporting the raw lock (in either format) would ideally be able to preflight before doing a still-expensive download by:
linux-64
) in some non-cosmetic place
/linux-64/
in any URL would probably enough... /noarch/
environment is even creatable at this time for anything except a "dataset" environmentjupyterhub[singleuser]
at a version that would be at least compatible with the hosting hub...
micromamba install jupyterhub-singleuser
could upgrade something in an undesirable way, even with a bunch of flags on itThe yml
format directly supports dependencies from other package managers, like pip
(and even other package manager managers like poetry
and pipenv
), while the @EXPLICIT
format kinda half-supports them, but behind #
s, so probably needs to be ignored entirely.
Thus far, there is no specific naming convention for A Well-Known Conda Lock File in a repo, as a number of "first-party" tools within the conda
org don't agree on what the extension should even be:
conda
itself) does not (and might not ever) support the .yml
format
conda env
won't support an @EXPLICIT
file, only conda create
conda-lock
generates lockfiles like conda-lock.yml
or conda-{platform}.lock
.yml
constructor
expects a .txt
file for an @EXPLICIT
(but has no guidance on the prefix)
.yml
format, even if micromamba
is used as the solver, and would try to use it like an environment.yml
So, to tighten up the above as a recommendation:
while a file has not been selected, for each of the below opinions (in basically this order, assuming linux-64
):
.binder/conda-lock.yml
.binder/conda-linux-64.lock
.binder/conda-linux-64.lock.txt
binder/conda-lock.yml
binder/conda-linux-64.lock
binder/conda-linux-64.lock.txt
conda-lock.yml
conda-linux-64.lock
conda-linux-64.lock.txt
.yml
linux-64
is found in #/metadata/platforms/
#/package/
contains name: jupyterhub-singleuser
/linux-64/
jupyterhub-singleuser
COPY {the file} /tmp/
micromamba env create --prefix {wherever/it/goes} --file /tmp/{the-file} && micromamba clean -yaf
Chiming in here with a user experience, leading to a question about the above recommendation. My goals are to keep only my project's dependencies in an environment.yml with minimal pinning, have some lockfile for protection against untested updates, and to not conflict with packages added by the conda buildpack. I do not understand how I can create or use a lockfile that is aware of the package constraints introduced in the conda buildpack. Wouldn't the recommendation, which uses create
rather than update
, require me to include jupyterhub-singleuser and friends with all the repo2docker constraints? If it were update
though, how/where/when could I invoke conda-lock on my environment.yml and repo2docker's environment.yml?
Aside: If not for some few packages that seem to need the notebook kernel to be the same as the environment running jupyterhub-singleuser, I would have used a separate environment for my project's kernel.
Proposed change
At the moment,
repo2docker
supports conda/mamba/micromambaenvironment.yml
environment files as Binder config files. This is great, but even if you pin packages with==
their dependencies can still float and so reproducibility into the future can break. For long term reproducible builds (e.g. launching into Binder from a Zenodo DOI) you would want to be able to also haverepo2docker
work with lock files. As the conda ecosystem is already supported a natural extension would be to useconda-lock
, and withmamba/micromamba
you can interact withconda-lock
lock files on a nearly equal footing as you would anenvironment.yml
.However, at the moment, if you place a
conda-lock
lock file namedenvironment.yml
under abinder/
directory in a repo,repo2docker
will fail to build from it and error with(c.f. https://github.com/matthewfeickert-talks/talk-pyhep-2023/pull/5)
It would be super nice if
conda-lock
lock files could have support added for them as a validrepo2docker
config file.Alternative options
Though if that is too big of a feature request, it would be nice if there was a method to allow users to interact with a
conda-lock
lock file that works withpostBuild
. At the moment, if you try to have apostBuild
config file that hasthis will again fail with
While
micromamba
is able to handle a command likeit seems that
conda
can not and so similarly having apostBuild
file withwill fail with
If the ability to install an environment from a
conda-lock
lock file without supportingconda-lock
could be supported then if instructions on how to work withconda-lock
lock files were also added this could resolve things as well.Who would use this feature?
People that want to ensure that a Binder link will run far into the future (so maybe the same people that put things on Zenodo).
How much effort will adding it take?
I'm not sure. I would hope not much, but I haven't taken the time to look at how
repo2docker
currently supports all the config files it already does.Who can do this work?
Someone with familiarity with
conda-lock
.