nf-core / eager

A fully reproducible and state-of-the-art ancient DNA analysis pipeline
https://nf-co.re/eager
MIT License
147 stars 82 forks source link

container creation failed error #894

Closed hmedwards closed 2 years ago

hmedwards commented 2 years ago

Hi there!

I've checked the recommended troubleshootings packages but am unable to find the answer to my problem.

I have tried running eager on a couple fastq files but hit an error saying container creation failed.

My nf-core/eager command, is:

nextflow run nf-core/eager \
-r 2.4.4 \
-profile imperial \
-name 'test11' \
--input '/.../test/*.fq.gz' \
--single_end \
--fasta '/.../he11/Cng_h99_CNA3_suprcont.fasta' \
--bwa_index '/.../he11/' \
--fasta_index '/.../he11/Cng_h99_CNA3_suprcont.fasta.fai' \
--seq_dict '/he11/Cng_h99_CNA3_suprcont.fasta.dict' \
--outdir '/.../nf-core/results/' \
-w '/.../nf-core/work/' \
--complexity_filter_poly_g \
--run_bam_filtering \
--bam_unmapped_type 'fastq' \
--run_metagenomic_screening \
--metagenomic_tool 'malt' \
--database '/.../malt_index/hyrax/' \
--malt_min_support_mode 'percent' \
--run_maltextract \
--maltextract_taxon_list '/.../hyrax.taxon.list' \
--maltextract_ncbifiles '/.../HOPS/Resources' \
--maltextract_destackingoff \
-dsl1

Which results in error messages:


executor >  pbspro (6)
[-        ] process > makeFastaIndex                 -
[-        ] process > makeSeqDict                    -
[-        ] process > convertBam                     -
[-        ] process > indexinputbam                  -
[ca/7135a1] process > fastqc (S1_L0)                 [100%] 2 of 2 ✔
[-        ] process > fastp                          -
[f0/be33cc] process > adapter_removal (S5_L0)        [100%] 2 of 2 ✔
[-        ] process > post_ar_fastq_trimming         -
[-        ] process > lanemerge                      -
[-        ] process > lanemerge_hostremoval_fastq    -
[-        ] process > fastqc_after_clipping          -
[-        ] process > bwa                            -
[-        ] process > bwamem                         -
[-        ] process > circulargenerator              -
[-        ] process > circularmapper                 -
[-        ] process > bowtie2                        -
[-        ] process > hostremoval_input_fastq        -
[-        ] process > seqtype_merge                  -
[-        ] process > samtools_flagstat              -
[-        ] process > samtools_filter                -
[-        ] process > samtools_flagstat_after_filter -
[-        ] process > endorSpy                       -
[-        ] process > dedup                          -
[-        ] process > markduplicates                 -
[-        ] process > library_merge                  -
[-        ] process > preseq                         -
[-        ] process > bedtools                       -
[-        ] process > damageprofiler                 -
[-        ] process > mapdamage_rescaling            -
[-        ] process > mask_reference_for_pmdtools    -
[-        ] process > pmdtools                       -
[-        ] process > bam_trim                       -
[-        ] process > additional_library_merge       -
[-        ] process > qualimap                       -
[-        ] process > genotyping_ug                  -
[-        ] process > genotyping_hc                  -
[-        ] process > genotyping_freebayes           -
[-        ] process > genotyping_pileupcaller        -
[-        ] process > eigenstrat_snp_coverage        -
[-        ] process > genotyping_angsd               -
[-        ] process > bcftools_stats                 -
[-        ] process > vcf2genome                     -
[-        ] process > multivcfanalyzer               -
[-        ] process > mtnucratio                     -
[-        ] process > sexdeterrmine_prep             -
[-        ] process > sexdeterrmine                  -
[-        ] process > nuclear_contamination          -
[-        ] process > print_nuclear_contamination    -
[-        ] process > metagenomic_complexity_filter  -
[-        ] process > malt                           -
[-        ] process > maltextract                    -
[-        ] process > kraken                         -
[-        ] process > kraken_parse                   -
[-        ] process > kraken_merge                   -
[cf/02c97f] process > output_documentation           [100%] 1 of 1, failed: 1 ✘
[69/4f93b3] process > get_software_versions          [100%] 1 of 1, failed: 1 ✘
[-        ] process > multiqc                        -
-[nf-core/eager] Pipeline completed with errors-
Error executing process > 'get_software_versions'

Caused by:
  Process `get_software_versions` terminated with an error exit status (255)

Command executed:

  echo 2.4.4 &> v_pipeline.txt
  echo 22.04.0 &> v_nextflow.txt

  fastqc --version &> v_fastqc.txt 2>&1 || true
  AdapterRemoval --version  &> v_adapterremoval.txt 2>&1 || true
  fastp --version &> v_fastp.txt 2>&1 || true
  bwa &> v_bwa.txt 2>&1 || true
  circulargenerator --help | head -n 1 &> v_circulargenerator.txt 2>&1 || true
  samtools --version &> v_samtools.txt 2>&1 || true
  dedup -v &> v_dedup.txt 2>&1 || true
  ## bioconda recipe of picard is incorrectly set up and extra warning made with stderr, this ugly command ensures only version exported
  ( exec 7>&1; picard MarkDuplicates --version 2>&1 >&7 | grep -v '/' >&2 ) 2> v_markduplicates.txt || true
  qualimap --version &> v_qualimap.txt 2>&1 || true
  preseq &> v_preseq.txt 2>&1 || true
  gatk --version 2>&1 | grep '(GATK)' > v_gatk.txt 2>&1 || true
  gatk3 --version 2>&1 | head -n 1 > v_gatk3.txt 2>&1 || true
  freebayes --version &> v_freebayes.txt 2>&1 || true
  bedtools --version &> v_bedtools.txt 2>&1 || true
  damageprofiler --version &> v_damageprofiler.txt 2>&1 || true
  bam --version &> v_bamutil.txt 2>&1 || true
  pmdtools --version &> v_pmdtools.txt 2>&1 || true
  angsd -h |& head -n 1 | cut -d ' ' -f3-4 &> v_angsd.txt 2>&1 || true
  multivcfanalyzer --help | head -n 1 &> v_multivcfanalyzer.txt 2>&1 || true
  malt-run --help |& tail -n 3 | head -n 1 | cut -f 2 -d'(' | cut -f 1 -d ',' &> v_malt.txt 2>&1 || true
  MaltExtract --help | head -n 2 | tail -n 1 &> v_maltextract.txt 2>&1 || true
  multiqc --version &> v_multiqc.txt 2>&1 || true
  vcf2genome -h |& head -n 1 &> v_vcf2genome.txt || true
  mtnucratio --help &> v_mtnucratiocalculator.txt || true
  sexdeterrmine --version &> v_sexdeterrmine.txt || true
  kraken2 --version | head -n 1 &> v_kraken.txt || true
  endorS.py --version &> v_endorSpy.txt || true
  pileupCaller --version &> v_sequencetools.txt 2>&1 || true
  bowtie2 --version | grep -a 'bowtie2-.* -fdebug' > v_bowtie2.txt || true
  eigenstrat_snp_coverage --version | cut -d ' ' -f2 >v_eigenstrat_snp_coverage.txt || true
  mapDamage --version > v_mapdamage.txt || true
  bbduk.sh | grep 'Last modified' | cut -d ' ' -f 3-99 > v_bbduk.txt || true
  bcftools --version | grep 'bcftools' | cut -d ' ' -f 2 > v_bcftools.txt || true
  scrape_software_versions.py &> software_versions_mqc.yaml

Command exit status:
  255

Command output:
  (empty)

Command error:
  FATAL:   container creation failed: mount /.../nf-core/work/69/4f93b33b26be47ec75c0b608efe4d4->/.../nf-core/work/69/4f93b33b26be47ec75c0b608efe4d4 error: while mounting /.../nf-core/work/69/4f93b33b26be47ec75c0b608efe4d4: destination /.../nf-core/work/69/4f93b33b26be47ec75c0b608efe4d4 doesn't exist in container

Work dir:
  /.../nf-core/work/69/4f93b33b26be47ec75c0b608efe4d4

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

and:

Exception in thread "Thread-4" groovy.lang.GroovyRuntimeException: exception while reading process stream
        at org.codehaus.groovy.runtime.ProcessGroovyMethods$TextDumper.run(ProcessGroovyMethods.java:500)
        at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.io.IOException: Stream closed
        at java.base/java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:176)
        at java.base/java.io.BufferedInputStream.read1(BufferedInputStream.java:289)
        at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:351)
        at java.base/sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
        at java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
        at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
        at java.base/java.io.InputStreamReader.read(InputStreamReader.java:185)
        at java.base/java.io.BufferedReader.fill(BufferedReader.java:161)
        at java.base/java.io.BufferedReader.readLine(BufferedReader.java:326)
        at java.base/java.io.BufferedReader.readLine(BufferedReader.java:392)
        at org.codehaus.groovy.runtime.ProcessGroovyMethods$TextDumper.run(ProcessGroovyMethods.java:493)
        ... 1 more

I'd appreciate any help on this. Many thanks.

jfy133 commented 2 years ago

Hi @hmedwards I'm not sure here unfortunately.

Looking at the imperial profile, it seems you are using singularity.

The error here:

  FATAL:   container creation failed: mount /.../nf-core/work/69/4f93b33b26be47ec75c0b608efe4d4->/.../nf-core/work/69/4f93b33b26be47ec75c0b608efe4d4 error: while mounting /.../nf-core/work/69/4f93b33b26be47ec75c0b608efe4d4: destination /.../nf-core/work/69/4f93b33b26be47ec75c0b608efe4d4 doesn't exist in container

Suggests to me singularity is having a problem mountining your filesystem to the container, so the software itself can't access the directories... but this isn't related to eager nor nextflow (As far as I can help).

I see that @combiz wrote the Imperial nf-core profile, maybe he has any suggestions?

jfy133 commented 2 years ago

In addition

Caused by: java.io.IOException: Stream closed

Maybe the node the job was running on had an interupption to the shared filesystem or something?

I say that as I see some of the steps of the pipeline finished correctly, and get_software_versions is an extremely small step (so it's not running out of memory or anything)

hmedwards commented 2 years ago

Hi @jfy133 Thanks so much for your quick reply. I wondered whether it may be a system issue as opposed to the pipeline itself. Hopefully @combiz will have some insight but I'll also try contacting someone at Imperial to see if they can help at all.

brisk022 commented 2 years ago

Interestingly, we had this problem as well. The temporary fix was to pull the image manually once, so that it was saved in the local singularity cache. Then it used the cached version and the error no longer occurred. The downside is that you have to do it each time the version is updated.

combiz commented 2 years ago

This does look familiar though I haven't encountered it in a while. We're currently using Singularity on the HPC with the Imperial config and it's working ok. Roman's suggestion to pull the image into a cache before starting sounds good. I'd also be sure to use the latest version of NF (rather than the module load version) for which you'll also need the latest java sdk (module load java/jdk-16).

On Thu, 16 Jun 2022 at 15:35, Roman Briskine @.***> wrote:

Interestingly, we had this problem as well. The temporary fix was to pull the image manually once, so that it was saved in the local singularity cache. Then it used the cached version and the error no longer occurred. The downside is that you have to do it each time the version is updated.

— Reply to this email directly, view it on GitHub https://github.com/nf-core/eager/issues/894#issuecomment-1157733462, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHM23ZCXLNCPEXDD3LIGJUTVPM3THANCNFSM5YTX7LLA . You are receiving this because you were mentioned.Message ID: @.***>

hmedwards commented 2 years ago

Thanks @brisk022 and @combiz

I am using nextflow version 22.04.3 so I don't think it is a version issue?

This is new to me I'm afraid, could you explain how I pull an image into a cache please? Thanks!

brisk022 commented 2 years ago

Unless you explicitly disable caching, singularity should do it automatically. You can pull the image and then delete it.

singularity pull --name eager.tmp.sif docker://nfcore/eager:2.4.4
rm eager.tmp.sif

After that, when you list the cached items with singularity cache list -v, there should be some items with recent timestamps.

Another way to check is to pull the image again. This time singularity should inform you that it is using the cached image.

$ singularity pull --name eager.tmp.sif docker://nfcore/eager:2.4.4
INFO:    Using cached SIF image

Unfortunately, I cannot say anything about versions. FWIW, we are using nextflow v22.04.0 from the bioconda channel.

hmedwards commented 2 years ago

Thanks. I tried re-running with the same conda version as you (22.04) but I had the same error occur.

Then I tried pulling and deleting the image as suggested but it seems it was already using a cached image? In any case after running this I got the same container creation error.


(nextflow22.04) [he11@login-a nf-core]$ singularity pull --name eager.tmp.sif docker://nfcore/eager:2.4.4
INFO:    Using cached SIF image
(nextflow22.04) [he11@login-a nf-core]$ rm eager.tmp.sif
(nextflow22.04) [he11@login-a nf-core]$ singularity cache list -v
NAME                     DATE CREATED           SIZE             TYPE
006c60b566117b45543ffe   2022-06-07 12:00:11    1.32 GiB         blob
11a4244dfa1c973d17ae6e   2022-06-07 11:59:36    0.74 KiB         blob
2ecb54bbaab44a001995fc   2022-06-07 11:59:34    0.09 KiB         blob
5225e31eacb3728e531e0f   2022-06-07 12:00:13    4.98 KiB         blob
5849476b4bf8a724cc5539   2022-06-07 11:59:33    547.29 KiB       blob
679c171d6942954a759f2d   2022-06-07 11:59:33    50.53 MiB        blob
7882648efbd1a387aeafae   2022-06-07 11:59:35    0.09 KiB         blob
852e50cd189dfeb54d9768   2022-06-07 11:59:28    25.85 MiB        blob
a6236801494d5ca9acfae6   2022-06-07 11:59:31    76.65 MiB        blob
cdc1d48f72cc1668a39fcd   2022-06-07 12:00:13    1.56 KiB         blob
d425ff08d54d93f02c9f95   2022-06-07 12:00:12    4.15 KiB         blob
198b710c2118afced6cbb2   2022-06-07 12:42:55    1.43 GiB         oci-tmp

There are 1 container file(s) using 1.43 GiB and 11 oci blob file(s) using 1.47 GiB of space
Total space used: 2.90 GiB
combiz commented 2 years ago

I've asked ICT as it may be due to changes in binding paths for containers in Singularity 3.8 vs 3.7.

e.g. https://github.com/apptainer/singularity/issues/6181#issuecomment-937315849

On Sat, 18 Jun 2022, 10:52 hmedwards, @.***> wrote:

Thanks. I tried re-running with the same conda version but same error occurred.

Then I tried pulling and deleting the image as suggested but it seems it was already using a cached image? In any case after running this I got the same container creation error.

(nextflow22.04) @. nf-core]$ singularity pull --name eager.tmp.sif docker://nfcore/eager:2.4.4 INFO: Using cached SIF image (nextflow22.04) @. nf-core]$ rm eager.tmp.sif (nextflow22.04) @.*** nf-core]$ singularity cache list -v NAME DATE CREATED SIZE TYPE 006c60b566117b45543ffe 2022-06-07 12:00:11 1.32 GiB blob 11a4244dfa1c973d17ae6e 2022-06-07 11:59:36 0.74 KiB blob 2ecb54bbaab44a001995fc 2022-06-07 11:59:34 0.09 KiB blob 5225e31eacb3728e531e0f 2022-06-07 12:00:13 4.98 KiB blob 5849476b4bf8a724cc5539 2022-06-07 11:59:33 547.29 KiB blob 679c171d6942954a759f2d 2022-06-07 11:59:33 50.53 MiB blob 7882648efbd1a387aeafae 2022-06-07 11:59:35 0.09 KiB blob 852e50cd189dfeb54d9768 2022-06-07 11:59:28 25.85 MiB blob a6236801494d5ca9acfae6 2022-06-07 11:59:31 76.65 MiB blob cdc1d48f72cc1668a39fcd 2022-06-07 12:00:13 1.56 KiB blob d425ff08d54d93f02c9f95 2022-06-07 12:00:12 4.15 KiB blob 198b710c2118afced6cbb2 2022-06-07 12:42:55 1.43 GiB oci-tmp

There are 1 container file(s) using 1.43 GiB and 11 oci blob file(s) using 1.47 GiB of space Total space used: 2.90 GiB

— Reply to this email directly, view it on GitHub https://github.com/nf-core/eager/issues/894#issuecomment-1159413498, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHM23ZCNXWZESPBCISZQ4WLVPWL6DANCNFSM5YTX7LLA . You are receiving this because you were mentioned.Message ID: @.***>

jfy133 commented 2 years ago

Hi @hmedwards @brisk022 @combiz I'm going to close this now as it's not an eager spceific error by the sounds of it, but feel free to keep communicating here if you wish!

charmoniumQ commented 1 year ago

I'm running into this error too.

Although the crash is coming from Singularity, I think the error is in Nextflow or in this pipeline. The documentation says "Beware that the mount points must exist in the built image". As such, the referenced apptainer ticket (apptainer/singularity#6181) is closed as intended behavior. Ideally, these mounts should exist in the container, nfcore/eager:2.4.6.

I'm willing to debug this myself, but I am confused on when Nextflow decides to create a bind mount. Although the eager pipeline crashes this way, I can't replicate it with simpler examples. Does anyone have ideas about that?

jfy133 commented 1 year ago

So more info on singularity in Nextflow:

https://www.nextflow.io/docs/latest/container.html#singularity

However I suspect the trick is herE: https://www.nextflow.io/docs/latest/config.html#scope-singularity

Where you need to specify the autoMounts settings in the singulartiy scope of a nextflow configuration pipeline, but note it has a caveat:

When true Nextflow automatically mounts host paths in the executed container. It requires the user bind control feature enabled in your Singularity installation (default: false).

Is this set accordingly for you?

To investigate further, you can look inside the .command.run file present in each working directory (this is actual bash/batch script executed by nextflow). The relevant sections for debugging is likely nxf_launch()

charmoniumQ commented 1 year ago

I started with the underlying .command.run and found this minimal non-working example:

$ mkdir /tmp/test
$ singularity exec -B /tmp/test https://depot.galaxyproject.org/singularity/python:3.8.3 pwd
INFO:    Converting SIF file to temporary sandbox...
WARNING: Skipping mount /home/azureuser/spack/opt/spack/linux-ubuntu22.04-zen2/gcc-11.3.0/apptainer-1.1.5-tkaiqwrpiog2vzr5okpp77nqpvdtwmv6/var/apptainer/mnt/session/etc/resolv.conf [files]: /etc/resolv.conf doesn't exist in container
INFO:    Cleaning up image...
FATAL:   container creation failed: mount hook function failure: mount /tmp/test->/tmp/test error: while mounting /tmp/test: destination /tmp/test doesn't exist in container

The bug might lie in my installation or configuration of Singularity, but the current Singularity documentation would suggest otherwise. It says that the mount points must already exist in the container, so Nextflow is incorrect by asking to mount something that does not already exist. Perhaps that documentation is out-of-date, or I am understanding it incorrectly.

I do have user bind control = yes in path/to/singularity/etc/singularity/singularity.conf, which is owned by root, and has allow setuid = yes.

I installed singularityce (and e2fsprogs and squashfuse) with Spack. I tried with/without suid, with both of apptainer and singularityce. The problem still persists.

charmoniumQ commented 1 year ago

It seems my problem was that I was trying to share a directory in /tmp and my /path/to/etc/singularity/singularity.conf has mount tmp = yes, which apparently conflicts.

singularity exec --bind /a/b/c ... works, and if I set mount tmp = no, then --bind /tmp/test also works.

Given this, I don't understand why the error message in Singularity and Apptainer was error while mounting /tmp/test: destination /tmp/test doesn't exist in container. Also I don't know what that bit of Singularity documentation about the mount point must exist means. I will file issues in those respective repositories. At least my Nextflow work can continue.

Thank you @jfy133 and @marissaDubbelaar for pointing me in the right direction.

jfy133 commented 1 year ago

Thanks for investigating @charmoniumQ ! Glad you were able to solve it somewhat :)