Open ewels opened 1 year ago
I think nextflow inspect
does that:
$ nextflow inspect main.nf -profile local
{
"processes": [
{
"name": "r2_CELL_CYCLE_SCORING_AND_PCA",
"container": "wave.seqera.io/wt/4fc019059a1f/wave/build:create_objects--c32b27bc3124db00"
},
...
So we just hook nextflow inspect
into nf-core download
. When they're running `nf-core download, they should have an internet connection, right? Worse case we export the containers on release and commit the json updates to the repos!
Yeah exactly, that's essentially my option 2 - fetch the container URIs at the point of download (or release) and have an associated config file that specifies the container URIs.
It basically means that offline users won't be using Wave at all, it's just a regular Nextflow run with containers as usual, but maybe this is the best solution.. My main issue with it is that it forces people to use nf-core download
.
I'm inclined to option 2 too. nextflow inspect
command was made keeping this possibility in mind.
It basically means that offline users won't be using Wave at all, it's just a regular Nextflow run with containers as usual, but maybe this is the best solution.. My main issue with it is that it forces people to use nf-core download.
Would users need to use wave at all, besides checking whether an image has been created? I was having that issue where it was returning the image name before it even got built (ie quay.io/nf-core/modules/bowtie:bowtie-1.3.0_samtools-1.16.1--82705d624eee2198
). So it should be able to go out and look for that image(I'm guessing right now it's auth-ing with the repo through Tower Platform).
But if we could tweak the behavior slightly (it might already be this):
What if we ran nextflow inspect
in CI in the pipelines on release, and had a containers.json that got generated.
Every single commit wouldn't be reproducible, but the releases would be able to be nf-core download
able.
I think that's a good compromise. It would vastly simplify the container downloading logic from nf-core download
https://github.com/seqeralabs/nf-aggregate/pull/43 Basically this 😆
Wave in Nextflow is beautifully simple - no need to define
container
URIs, just the conda package names and we get everything for free. However, for wide adoption (or at least, adoption in @nf-core), we need to support offline usage of pipelines.For offline work, the process is typically as follows:
This hinges on Nextflow checking the local container cache (eg.
NXF_SINGULARITY_CACHE
) for images before attempting to download them. Things like Singularity container filenames are predictable so it's easy for us to wrap download functionality into tooling likenf-core download
and make sure that they are available.However, this assumption breaks with Wave. Currently, Nextflow needs to reach out to the Wave service (online) to find out the
container
URI and resulting local cache filename. So without an internet connection, it doesn't know where to check locally.As I see it, we have two options:
container
URIs could be built offline and everything would work.nf-core download
to write container URIs to a Nextflow config file, fetch the container images, and bundle this config with the pipeline somehow so that it works without further configuration by the users.