nf-core / tools

Python package with helper tools for the nf-core community.
https://nf-co.re
MIT License
241 stars 191 forks source link

Use singularity containers for apptainer profile too #2357

Open SPPearce opened 1 year ago

SPPearce commented 1 year ago

Description of feature

Singularity has split out into an open source version called Apptainer two years ago.

Currently the logic for using singularity containers for modules is hardcoded to require the use of the singularity container profile, e.g.

 container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
        'https://depot.galaxyproject.org/singularity/ubuntu:20.04' :
        'nf-core/ubuntu:20.04' }"

so the apptainer profile will build from the docker containers instead, which is less robust and more likely to fail. It would therefore be good to enable the singularity images for apptainer too. This was discussed in on Slack, with this summary from Phil @ewels :

Yup, so in summary: Short term fix to just get things working is apptainer.registry, now merged and will go out in release imminently TODO: Medium term fix: We should write up some docs explaining about the singularity command alias and how we can trick the system to work as you described. TODO: Long term fix: Update logic in all modules, plus all container parsing logic, plus all infra, to handle apptainer profile name :see_no_evil: May be worth discussing using a param instead of relying on workflow.containerEngine for better future proofing / more concise code? :information_source: Note that Nextflow will hopefully give us a list of container names soon, meaning a significant rewrite for the nf-core download infra, may make sense to wait for that and do this all in one go

SPPearce commented 1 year ago

I do think that using a parameter instead of workflow.containerEngine is the neater solution, in case there are further options in the future it wouldn't require changing all the modules again.

alexlyttle commented 2 months ago

I have recently started writing a config for our institution's HPC and wondered if there is any planned progress on this issue (particularly "a significant rewrite for the nf-core download infra")? Our HPC uses apptainer, but I was wondering why it was not pulling the pre-built images and occasionally failing. Looking at the docs and other institution config lead me to the warning in the docs:

TODO: Medium term fix: We should write up some docs explaining about the singularity command alias and how we can trick the system to work as you described.

I went with this in the end, switching to singularity in the config and relying on it being an alias for apptainer.

Could this solution cause problems in the future (besides the confusion for others like myself writing an institution config)?

MatthiasZepper commented 2 months ago

This does not really have anything to do with nf-core download. Whether you pre-download the pipeline or directly reference the GitHub repo: If you run a particular revision of a pipeline, it will use whatever container specification was used in the module at the time of finalizing the release.

As of now, unfortunately many of the over 1000 modules in the modules repo have not been updated, so they still have "singularity" hard-coded:

    container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
        'https://depot.galaxyproject.org/singularity/multiqc:1.24.1--pyhdfd78af_0' :
        'biocontainers/multiqc:1.24.1--pyhdfd78af_0' }"

Regardless of any config setting, nf-core download will always download the Apptainer/Singularity images in *.sif format, because that is the only container engine currently supported for pre-downloading. Docker images always need to be pulled at runtime. The tool itself supports many different container notations (including explicit apptainer references), because they changed over time and it in most cases can also deal will really old pipeline versions.

So regarding your config, you should indeed stick to singularity for now. Because whether you have cached the image or not, Nextflow will not use it in that case. Your only alternative option would be to provide a custom config with the pipeline that overrides the container definitions for all processes similar to what nf-core modules will be using in the future.

Rewriting nf-core download is a huge task and unfortunately not really high priority, but will be required when that change is finalised. It, however, does not have anything to do with the problem that you are facing.

alexlyttle commented 2 months ago

Thanks @MatthiasZepper, very useful information!

For some reason I misinterpreted the line "a significant rewrite for the nf-core download infra" to refer to how the modules use singularity/apptainer to pull containers. Since as there are so many modules where singularity is hard-coded, then as you say the config workaround is the way to go for now!

As an aside, when I initially enabled apptainer in the config I was facing loads problems relating to this issue. This occurs when pulling docker images, which was always happening because of the helpful info in the docs. In fact, I have found the nf-core download tool very useful if these issues still happen.

MatthiasZepper commented 2 months ago

For some reason I misinterpreted the line "a significant rewrite for the nf-core download infra" to refer to how the modules use singularity/apptainer to pull containers

On a second thought, that may actually not have been a misinterpretation, because there are indeed significant changes needed as well, since hard-coding the container URIs in the modules shall be removed somewhen in the future. The problem is, that this needs to happen in conjunction with coordinated updates to Nextflow, to Seqera containers and to nf-core tools (for download, linting and a new helper tool to write those configs when a pipeline is released)....and this task feels so daunting that nobody is really driving that change and presumably everyone secretly hopes that somebody else will implement the required changes^^. Plus there is still the open question what to do with tools, that are not available via conda which is, as far as I know, not decided yet.

SPPearce commented 2 months ago

Matthias, do you know of any modules that aren't hard coded with singularity? A bulk update to that line would be straightforward enough, while we have bulk updates to do anyway

MatthiasZepper commented 2 months ago

Frankly...no 😆. With not hard coded with singularity I actually also meant additionally hard coded with apptainer as well.

Since I am not involved in the modules, DSL and linting related parts of nf-core, I also have a very coarse overview of what is actually being discussed and happening there. All I know for sure is, that there was a release PR for nf-core tools more than a year ago, when the apptainer issue popped up in the final reviews. Ultimately, I did a late-night rewrite of nf-core download to accommodate a new planned container declaration that would soon be implemented.

Because people at that time were still arguing over what the new container declaration should look like in modules, nf-core download now supports them both. You can see from the tests what people wanted to implement, but I never bothered to check what conclusion was ultimately reached...until a few days ago when I realized that at least the MuliQC module still has neither of those.

To my best knowledge, those two options (Mock examples) were discussed:

    container "${ (workflow.containerEngine == 'singularity' || workflow.containerEngine == 'apptainer') && !task.ext.singularity_pull_docker_container ?
        'https://depot.galaxyproject.org/singularity/dsltwoapptainervarone:1.1.0--py38h7be5676_2':
        'biocontainers/dsltwoapptainervarone:1.1.0--py38h7be5676_2' }"
    conda "bioconda::dsltwoapptainervartwo=1.1.0"
    container "${ ['singularity', 'apptainer'].contains(workflow.containerEngine) && !task.ext.singularity_pull_docker_container ?
        'https://depot.galaxyproject.org/singularity/dsltwoapptainervartwo:1.1.0--hdfd78af_0':
        'biocontainers/dsltwoapptainervartwo:1.1.0--hdfd78af_0' }"

If there is a point in bulk-updating all modules, if those definitions are already planned to be removed entirely from the modules in the near(?) future, I do not know. I lean towards putting emphasis on implementing the new logic for having the nice multi-aarch support by Seqera Containers and solving the registry issues with download along the way. But ultimately that is not my business - or as we say in Germany: "Not my circus, not my monkeys" 🙉.

SPPearce commented 2 months ago

Yes, it doesn't seem to have ever been actioned on. As you say, the Seqera containers tools support doesn't seem to actually be being pushed at all..., but when it does happen I agree it should solve this issue directly. It is on the bulk modules roadmap for this "summer" (slipping already!). I use "Not my circus, not my monkeys" regularly in English too, it is a useful phrase I find.

Could this solution cause problems in the future (besides the confusion for others like myself writing an institution config)? @alexlyttle , I don't see an issue in the immediate future, if there becomes one you'd probably discover it relatively quickly when the profile stops working. I have been using the singularity profile with apptainer for over a year now.

alexlyttle commented 2 months ago

I don't see an issue in the immediate future, if there becomes one you'd probably discover it relatively quickly when the profile stops working. I have been using the singularity profile with apptainer for over a year now.

Thanks @SPPearce, good to know. I'll keep an eye out for the Seqera support too.