nextflow-io / nextflow

A DSL for data-driven computational pipelines
http://nextflow.io
Apache License 2.0
2.68k stars 621 forks source link

Support unified lock files from conda-lock #5220

Closed pinin4fjords closed 3 days ago

pinin4fjords commented 1 month ago

New feature

Nextflow already supports the usage of platform-specific lock files for conda-lock < 1.0. It does this via the usual conda env commands:

conda env create --file conda-lock-linux-64.lock

Newer versions of conda-lock generate a 'unified' format, allowing for the environment of multiple platforms to be specified in the same file. This format is not compatible with conda env, and must be 'rendered' back to the older single-platform style for use with Conda. The unified format can be used to create an environment, but it must be done with the conda-lock command:

conda-lock install -n conda-lock-test conda-lock.yml

But having a single file defining the frozen environment across platforms is nice. Users don't have to track a lock file for every platform, and having the 'lock' process run just the once means that lock files for different platforms are less likely to drift relative to each other, which is better for reproducibility.

So it would be nice if we could support unified lock files via conda-lock, rather than just platform-specific lock files via conda env.

Usage scenario

Users (e.g. the nf-core community) could run conda-lock just once for a given module, based on an environment.yml. platform entries in the yml can be used to define supported platforms. The resulting single multi-platform lock file can then be stored alongside the environment.yml, and optionally used in place of the environment.yml.

Suggest implementation

  1. Nextflow recognises *.lock.yml
  2. Nexflow verifies that conda-lock is available
  3. Nextflow runs conda-lock install in place of conda env create
pditommaso commented 1 month ago

It depends where we want to go with this feature. While lock files are great for reproducibility, they have two important important drawback:

  1. it a huge work to maintain manually for each module/process
  2. when using lock file (as well as an environment file) in the conda directive. the process dependencies are completely obfuscated.

(I also fear the lock file is resolved against your local Conda configuration, therefore they not even be fully reproducible).

My ideal solution would be that the user still defines process deps via plain packages in the conda directive, then Wave should resolve the lock file and use to build the corresponding container, make it accessible in the build metadata.

pinin4fjords commented 1 month ago

I'd argue that 1) is resolvable with some tooling, e.g. in nf-core. We can easily have CI rebuild lockfiles on every change to environment.ymls (that's what Edmund was up to).

For 2), do you mean that you dislike the statement of dependencies outside of the main.nf?

If as you say the lock files are not even portable across machines then my thinking on using them is dead in the water, so we should do some more testing there. But if it was to work I think there will be a lot of people who would appreciate the ability to use conda in a 'frozen' way without dependency resolution, without having container runtimes available.

pinin4fjords commented 1 month ago

(but I won't debug https://github.com/nextflow-io/nextflow/pull/5221 any further if you're not in favour of this)

pditommaso commented 1 month ago

Re 2) my point is that the current implementation is the best we can do on nextflow side. I believe a better support should be provided on Wave side

pditommaso commented 1 month ago

Was looking into this, and interestingly enough Pixi has build in support for Lock files https://github.com/seqeralabs/wave/issues/521#issuecomment-2284950474

pditommaso commented 3 days ago

Implementing this on Wave side https://github.com/seqeralabs/wave/issues/172