nf-core / tools

Python package with helper tools for the nf-core community.
https://nf-co.re
MIT License
244 stars 192 forks source link

Container cache download #3163

Open muffato opened 2 months ago

muffato commented 2 months ago

Fixes #3019 and #3162.

First #3162. The code has to deal with an out_path (always defined) and sometimes a cache_path too. The implementation was slightly convoluted as it was doing fresh downloads into the cache_path or out_path (first decision point) and then doing an extra copy if needed (second decision point). Because of that confusion, symlinks across container registries were not all created across both locations. I propose to reverse the logic to make it more straightforward:

  1. Always download into out_path and create its symlinks.
  2. Then, optionally copy to cache_path (and create symlinks there).

Then, #3019: I propose to handle the singularity "library" directory this way:

  1. Just like in Nextflow itself, it's considered a read-only location. It means that containers can only be copied from it, not to it, and that we shouldn't be even creating symlinks there.
  2. There is no point in having a --container-library-utilisation parameter for the library because i) remote would be redundant with the --container-cache-utilisation's remote mode, ii) amend is not possible as per the read-only rule, so iii) copy is the only possible mode.
  3. Therefore, the most natural place to use the library is as a source of containers, in parallel of https downloads and singularity pull. When NXF_SINGULARITY_LIBRARYDIR is set and the container exists in the library, it is copied to the target directories (out_path and possibly cache_path too)

PR checklist

codecov[bot] commented 2 months ago

Codecov Report

Attention: Patch coverage is 0% with 16 lines in your changes missing coverage. Please review.

Project coverage is 75.52%. Comparing base (8e47a33) to head (78650af).

Files with missing lines Patch % Lines
nf_core/pipelines/download.py 0.00% 16 Missing :warning:
Additional details and impacted files

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

muffato commented 2 months ago

The code is there, but I still need to add unit-tests and documentation

ewels commented 1 week ago

@muffato - do you think you'll have time to take a look at this PR in the near(ish) future? It'd be great to get it moving and merged if possible..

muffato commented 1 week ago

Hi @ewels . Unfortunately there's too much work left to do on this PR. I can't see myself getting to the bottom of it any time soon.

ewels commented 5 days ago

No problem - hopefully @mirpedrol can take it over then when she has time, if that's ok 🙏🏻

MatthiasZepper commented 5 days ago

No problem - hopefully @mirpedrol can take it over then when she has time, if that's ok 🙏🏻

Frankly, I consider this feature expendable. It is nice to save some download bandwidth and storage space, but the poor Seqera Containers support is haunting me much more (and also wasting significant storage).

Considering that Júlia has the most extensive knowledge of the nf-core tools codebase, I would rather suggest / entreat / beg her for working on tooling around the container.yml creation, linting and release logic. As soon as each new pipeline release ships with it, we can adopt it on the download side as well.