Provide support or recommendation for how to interact with conda-lock lockfiles

Proposed change

At the moment, repo2docker supports conda/mamba/micromamba environment.yml environment files as Binder config files. This is great, but even if you pin packages with == their dependencies can still float and so reproducibility into the future can break. For long term reproducible builds (e.g. launching into Binder from a Zenodo DOI) you would want to be able to also have repo2docker work with lock files. As the conda ecosystem is already supported a natural extension would be to use conda-lock, and with mamba/micromamba you can interact with conda-lock lock files on a nearly equal footing as you would an environment.yml.

However, at the moment, if you place a conda-lock lock file named environment.yml under a binder/ directory in a repo, repo2docker will fail to build from it and error with

EnvironmentSectionNotValid: The following sections on '/home/jovyan/binder/environment.yml' are invalid and will be ignored:
 - version
 - metadata
 - package

(c.f. https://github.com/matthewfeickert-talks/talk-pyhep-2023/pull/5)

It would be super nice if conda-lock lock files could have support added for them as a valid repo2docker config file.

Alternative options

Though if that is too big of a feature request, it would be nice if there was a method to allow users to interact with a conda-lock lock file that works with postBuild. At the moment, if you try to have a postBuild config file that has

conda env update --file binder/conda-lock.yml --prune

this will again fail with

EnvironmentSectionNotValid: The following sections on '/home/jovyan/binder/environment.yml' are invalid and will be ignored:
 - version
 - metadata
 - package

While micromamba is able to handle a command like

micromamba install --file binder/conda-lock.yml

it seems that conda can not and so similarly having a postBuild file with

conda install --file binder/conda-lock.yml

will fail with

CondaValueError: could not parse 'version: 1' in: binder/conda-lock.yml

If the ability to install an environment from a conda-lock lock file without supporting conda-lock could be supported then if instructions on how to work with conda-lock lock files were also added this could resolve things as well.

Who would use this feature?

People that want to ensure that a Binder link will run far into the future (so maybe the same people that put things on Zenodo).

How much effort will adding it take?

I'm not sure. I would hope not much, but I haven't taken the time to look at how repo2docker currently supports all the config files it already does.

Who can do this work?

Someone with familiarity with conda-lock.

We already have lock files for pinning the base requirements, though these aren't yaml files: https://github.com/jupyterhub/repo2docker/tree/main/repo2docker/buildpacks/conda Is this a different type of lock file?

@manics This might be a conda-lock version issue. The conda-lock format was unified in conda-lock v1.0.0 (c.f. https://github.com/conda/conda-lock/pull/124)

https://github.com/conda/conda-lock/blob/425b384ffd010461d9a4f3c61d286e31a21f14f3/README.md?plain=1#L68-L76

By default, conda-lock store its output in conda-lock.yml in the current working directory. This file will also be used by default for render, install, and update operations. You can supply a different filename with e.g.
conda-lock --lockfile superspecial.conda-lock.yml

It seems though, that yes, the format of what you have is different. Example:

https://github.com/jupyterhub/repo2docker/blob/8c32db99878fa3cd532f2b9ee107cfded058088a/repo2docker/buildpacks/conda/environment.py-3.11-linux-64.lock#L1-L10

compared to something like https://iris-hep.org/analysis-systems-env-nightlies/iris-hep-rc/3.11/conda-lock.yml

# This lock file was generated by conda-lock (https://github.com/conda/conda-lock). DO NOT EDIT!
#
# A "lock file" contains a concrete list of package versions (with checksums) to be installed. Unlike
# e.g. `conda env create`, the resulting environment will not change as new package versions become
# available, unless you explicitly update the lock file.
#
# Install this environment as "YOURENV" with:
#     conda-lock install -n YOURENV --file conda-lock.yml
# To update a single package to the latest version compatible with the version constraints in the source:
#     conda-lock lock  --lockfile conda-lock.yml --update PACKAGE
# To re-solve the entire environment, e.g. after changing a version constraint in the source file:
#     conda-lock -f iris-hep-rc/3.11/environment.yml --lockfile conda-lock.yml
version: 1
metadata:
  content_hash:
    linux-64: e002febb8b04300e80dded8f2b7dabb269ace11a83f98db20719007774f0f52c
  channels:
  - url: conda-forge
    used_env_vars: []
  platforms:
  - linux-64
  sources:
  - iris-hep-rc/3.11/environment.yml
package:
- name: _libgcc_mutex
  version: '0.1'
  manager: conda
  platform: linux-64
  dependencies: {}
  url: https://conda.anaconda.org/conda-forge/linux-64/_libgcc_mutex-0.1-conda_forge.tar.bz2
  hash:
    md5: d7c89558ba9fa0495403155b64376d81
    sha256: fe51de6107f9edc7aa4f786a70f4a883943bc9d39b3bb7307c04c41410990726
  category: main
  optional: false
- name: ca-certificates
  version: 2023.7.22
  manager: conda
  platform: linux-64
  dependencies: {}
  url: https://conda.anaconda.org/conda-forge/linux-64/ca-certificates-2023.7.22-hbcca054_0.conda
  hash:
    md5: a73ecd2988327ad4c8f2c331482917f2
    sha256: 525b7b6b5135b952ec1808de84e5eca57c7c7ff144e29ef3e96ae4040ff432c1
  category: main
  optional: false
...

(edit)

Ah yes, here we go:

https://github.com/conda/conda-lock/blob/425b384ffd010461d9a4f3c61d286e31a21f14f3/README.md?plain=1#L57-L64

Pre 1.0 compatible usage (explicit per platform locks)

If you were making use of conda-lock before the 1.0 release that added unified lockfiles you can still get that behaviour by making use of the explicit output kind.
conda-lock --kind explicit -f environment.yml

So it seems that you're using the pre-v1.0 explicit lock file format over the v1.0+ unified lockfile.

Supporting conda-lock outputs would be... nice, but would likely require some guardrails, and blessing some "r2d knows best" conventions.

conda-lock itself hauls in... a lot of dependencies, so may not be a good candidate for the "base coat" environment. micromamba, already present, is certainly up to the task of consuming both formats... though pixi very well might end up "winning" for this use case.

As, ideally, it would replace (not change) the notebook environment, supporting the raw lock (in either format) would ideally be able to preflight before doing a still-expensive download by:

checking some marker that is compatible with the runtime (e.g. linux-64) in some non-cosmetic place
- comments and filenames can't count
- but really, the presence of e.g /linux-64/ in any URL would probably enough...
- i don't think a fully /noarch/ environment is even creatable at this time for anything except a "dataset" environment
contain e.g. jupyterhub[singleuser] at a version that would be at least compatible with the hosting hub...
- running micromamba install jupyterhub-singleuser could upgrade something in an undesirable way, even with a bunch of flags on it

The yml format directly supports dependencies from other package managers, like pip (and even other package manager managers like poetry and pipenv), while the @EXPLICIT format kinda half-supports them, but behind #s, so probably needs to be ignored entirely.

Thus far, there is no specific naming convention for A Well-Known Conda Lock File in a repo, as a number of "first-party" tools within the conda org don't agree on what the extension should even be:

the reference implementation (conda itself) does not (and might not ever) support the .yml format
- conda env won't support an @EXPLICIT file, only conda create
by default, conda-lock generates lockfiles like conda-lock.yml or conda-{platform}.lock
- both can be overriden, though the YAML format must end in .yml
constructor expects a .txt file for an @EXPLICIT (but has no guidance on the prefix)
- it doesn't support the .yml format, even if micromamba is used as the solver, and would try to use it like an environment.yml

So, to tighten up the above as a recommendation:

while a file has not been selected, for each of the below opinions (in basically this order, assuming linux-64):
```
.binder/conda-lock.yml
.binder/conda-linux-64.lock
.binder/conda-linux-64.lock.txt
binder/conda-lock.yml
binder/conda-linux-64.lock
binder/conda-linux-64.lock.txt
conda-lock.yml
conda-linux-64.lock
conda-linux-64.lock.txt
```
- if the file ends in .yml
- and linux-64 is found in #/metadata/platforms/
  - and a member of#/package/ contains name: jupyterhub-singleuser
  - select this file
- otherwise, if the file contains /linux-64/
- and the file contains jupyterhub-singleuser
  - select this file
if no file is selected, fail
COPY {the file} /tmp/
micromamba env create --prefix {wherever/it/goes} --file /tmp/{the-file} && micromamba clean -yaf

Chiming in here with a user experience, leading to a question about the above recommendation. My goals are to keep only my project's dependencies in an environment.yml with minimal pinning, have some lockfile for protection against untested updates, and to not conflict with packages added by the conda buildpack. I do not understand how I can create or use a lockfile that is aware of the package constraints introduced in the conda buildpack. Wouldn't the recommendation, which uses create rather than update, require me to include jupyterhub-singleuser and friends with all the repo2docker constraints? If it were update though, how/where/when could I invoke conda-lock on my environment.yml and repo2docker's environment.yml?

Aside: If not for some few packages that seem to need the notebook kernel to be the same as the environment running jupyterhub-singleuser, I would have used a separate environment for my project's kernel.

jupyterhub / repo2docker