nf-core / raredisease

Call and score variants from WGS/WES of rare disease patients.
https://nf-co.re/raredisease
MIT License
84 stars 34 forks source link

bash: singularity: command not found #575

Closed hrydbeck closed 2 months ago

hrydbeck commented 3 months ago

Description of the bug

Get "bash: singularity: command not found" when running pipeline on slurm

Command used and terminal output

nextflow run nf-core/raredisease -r "patch" -profile test,singularity -c ./halfdan_pdc.config --outdir test_rev_patch_out --project naiss2024-22-481

ERROR ~ Error executing process > 'NFCORE_RAREDISEASE:RAREDISEASE:SMNCOPYNUMBERCALLER (1)'

Caused by:
  Failed to pull singularity image
  command: singularity pull  --name depot.galaxyproject.org-singularity-smncopynumbercaller-1.1.2--py310h7cba7a3_0.img.pulling.1719305149448 https://depot.galaxyproject.org/singularity/smncopynumbercaller:1.1.2--py310h7cba7a3_0 > /dev/null
  status : 127
  message:
    bash: singularity: command not found

Relevant files

command.md nextflow.log halfdan_pdc.config.txt

System information

Nextflow version 23.10.1 HPC,PDC-Dardel slurm Singularity SUSE Linux Enterprise Server 15 SP5 nfc-rd version "patch"

asp8200 commented 3 months ago

This may be a silly question, @hrydbeck, but are you sure you got Singularity available on your system? (Also, on the compute-nodes?). (I get that error when I try to kick off the pipeline which first having made Singularity available.)

hrydbeck commented 3 months ago

Hi @asp8200 and thank you for helping out. I find it difficult to grasp the cause of events when calling Nextflow with a configuration files that specifies to use Slurm. Since there is a section in "halfdan_pdc.config" (modified from https://nf-co.re/configs/pdc_kth) that loads the needed modules including apptainer/singularity I believed that would make singularity available at any used nodes. But, after your email I tried to load needed modules also before submittign the Nextflow command. Then I get a different error message:

################################################### ERROR ~ Error executing process > 'NFCORE_RAREDISEASE:RAREDISEASE:CALL_MOBILE_ELEMENTS:RETROSEQ_DISCOVER (7)'

Caused by: Failed to pull singularity image command: singularity pull --name docker.io-clinicalgenomics-retroseq-1.5_9d4f3b5-1.img.pulling.1719391594730 docker://docker.io/clinicalgenomics/retroseq:1.5_9d4f3b5-1 > /dev/null status : 255 message: INFO: Converting OCI blobs to SIF format INFO: Starting build... INFO: Fetching OCI image... INFO: Extracting OCI image... 2024/06/26 10:47:39 warn xattr{var/cache/apt/archives/partial} ignoring ENOTSUP on setxattr "user.rootlesscontainers" 2024/06/26 10:47:39 warn xattr{/tmp/build-temp-3733473279/rootfs/var/cache/apt/archives/partial} destination filesystem does not support xattrs, further warnings will be suppressed FATAL: While making image from oci registry: error fetching image to cache: while building SIF from layers: packer failed to pack: while unpacking tmpfs: error unpacking rootfs: unpack entry: opt/conda/pkgs/certifi-2019.9.11-py37_0/lib/python3.7/site-packages/certifi-2019.9.11-py3.7.egg-info/PKG-INFO: link: unpriv.link: unpriv.wrap target: no such file or directory

#############################################################

Could this have anything to do with that:

"Singularity is installed as a nosuid on the cluster, meaning that you are unable to use singularity files, but are able to use singularity sandboxes instead."?

https://www.pdc.kth.se/support/documents/software/singularity.html#installation-of-singularity

asp8200 commented 3 months ago

Hi @asp8200 and thank you for helping out. I find it difficult to grasp the cause of events when calling Nextflow with a configuration files that specifies to use Slurm. Since there is a section in "halfdan_pdc.config" (modified from https://nf-co.re/configs/pdc_kth) that loads the needed modules including apptainer/singularity I believed that would make singularity available at any used nodes. But, after your email I tried to load needed modules also before submittign the Nextflow command. Then I get a different error message:

################################################### ERROR ~ Error executing process > 'NFCORE_RAREDISEASE:RAREDISEASE:CALL_MOBILE_ELEMENTS:RETROSEQ_DISCOVER (7)'

Caused by: Failed to pull singularity image command: singularity pull --name docker.io-clinicalgenomics-retroseq-1.5_9d4f3b5-1.img.pulling.1719391594730 docker://docker.io/clinicalgenomics/retroseq:1.5_9d4f3b5-1 > /dev/null status : 255 message: INFO: Converting OCI blobs to SIF format INFO: Starting build... INFO: Fetching OCI image... INFO: Extracting OCI image... 2024/06/26 10:47:39 warn xattr{var/cache/apt/archives/partial} ignoring ENOTSUP on setxattr "user.rootlesscontainers" 2024/06/26 10:47:39 warn xattr{/tmp/build-temp-3733473279/rootfs/var/cache/apt/archives/partial} destination filesystem does not support xattrs, further warnings will be suppressed FATAL: While making image from oci registry: error fetching image to cache: while building SIF from layers: packer failed to pack: while unpacking tmpfs: error unpacking rootfs: unpack entry: opt/conda/pkgs/certifi-2019.9.11-py37_0/lib/python3.7/site-packages/certifi-2019.9.11-py3.7.egg-info/PKG-INFO: link: unpriv.link: unpriv.wrap target: no such file or directory

#############################################################

Could this have anything to do with that:

"Singularity is installed as a nosuid on the cluster, meaning that you are unable to use singularity files, but are able to use singularity sandboxes instead."?

https://www.pdc.kth.se/support/documents/software/singularity.html#installation-of-singularity

Hmmm.. Good observation with the singularity-sandbox requirement on your HPC. I noticed that when Nextflow runs modules with singularity, Singularity will (often) output the message "INFO: Convert SIF file to sandbox...", so perhaps Nextflow executes the Singularity-images "in a sandbox". (Here I'm writing about stuff I don't know anything about 😆 )

I have these ideas: 1) Do some basic tests of singularity on the login/head-node (if possible). Can you, for instance, run the following command?

singularity pull --name docker.io-clinicalgenomics-retroseq-1.5_9d4f3b5-1.img.pulling.1719391594730 docker://docker.io/clinicalgenomics/retroseq:1.5_9d4f3b5-1

2) Same tests but on the compute-nodes (perhaps by using slurm to commit the tests to the compute-nodes), and 3) It might be that Singularity is having problems pulling the singularity-images from the net. I always first fetch the needed singularity-images by running nf-core download raredisease.

jemten commented 3 months ago

I think you're right @asp8200, I sometime run into issues when converting docker images to singularity on the compute node. Depending on image, it takes quite a lot of memory to convert from docker to singularity. Thus the process might have enough memory to run the container but not doing the conversion. Doing nf-core download or executing a stub run on the log in node, where you usually have more memory available, could solve the issue.

hrydbeck commented 2 months ago

I think nf-core download raredisease did the trick. I got to a new error message. Coming in a new issue.