nf-core / taxprofiler

Highly parallelised multi-taxonomic profiling of shotgun short- and long-read metagenomic data
https://nf-co.re/taxprofiler
MIT License
105 stars 32 forks source link

Centrifuge error : (ERR): mkfifo(/tmp/72.inpipe1) failed. #422

Closed erinyoung closed 5 months ago

erinyoung commented 7 months ago

Description of the bug

I downloaded the pre-built centrifuge database from https://benlangmead.github.io/aws-indexes/k2 (the full link is https://genome-idx.s3.amazonaws.com/centrifuge/nt_2018_3_3.tar.gz) to use with taxprofiler. It does not run as expected. I first brought this up in slack.

This is my database file (the listed kraken2 databases worked fine):

tool,db_name,db_params,db_path
kraken2,nt,,../data/k2_nt_20230502.tar.gz
kraken2,viral,,../data/k2_viral_20231009.tar.gz
centrifuge,nt_2018,,../data/nt_2018_3_3.tar.gz

This is the error message:

Workflow execution completed unsuccessfully
The exit status of the task that caused the workflow execution to fail was: 17

Error executing process > 'NFCORE_TAXPROFILER:TAXPROFILER:PROFILING:CENTRIFUGE_CENTRIFUGE (nt_2018|B002f_pe)'

Caused by:
  Process `NFCORE_TAXPROFILER:TAXPROFILER:PROFILING:CENTRIFUGE_CENTRIFUGE (nt_2018|B002f_pe)` terminated with an error exit status (17)

Command executed:

  ## we add "-no-name ._" to ensure silly Mac OSX metafiles files aren't included
  db_name=`find -L nt_2018_3_3 -name "*.1.cf" -not -name "._*"  | sed 's/\.1.cf$//'`
  centrifuge \
      -x $db_name \
      -p 12 \
      -1 B002f_VH00770.unmapped_1.fastq.gz -2 B002f_VH00770.unmapped_2.fastq.gz \
      --report-file B002f_pe_VH00770_nt_2018.centrifuge.report.txt \
      -S B002f_pe_VH00770_nt_2018.centrifuge.results.txt \
       \
       \

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_TAXPROFILER:TAXPROFILER:PROFILING:CENTRIFUGE_CENTRIFUGE":
      centrifuge: $( centrifuge --version  | sed -n 1p | sed 's/^.*centrifuge-class version //')
  END_VERSIONS

Command exit status:
  17

Command output:
  (empty)

Command error:
  INFO:    Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred
  INFO:    Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
  (ERR): mkfifo(/tmp/72.inpipe1) failed.
  Exiting now ...

Work dir:
  /Volumes/IDGenomics_NAS/Bioinformatics/eriny/mosquito/2023-12-01/work/11/33708dd319d5e790a7d5b44e106865

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

Command used and terminal output

nextflow run nf-core/taxprofiler \
         -profile singularity \
         --input sample_sheet.csv \
         --databases database.txt \
         --outdir taxprofiler \
         --run_kraken2 \
         --run_centrifuge \
         -with-tower \
         -resume \
         --hostremoval_reference reference.fasta \
         --save_hostremoval_bam true \
         --perform_shortread_hostremoval \
         --save_hostremoval_unmapped true \
         --perform_shortread_qc \
         -c time.config
N E X T F L O W  ~  version 23.10.0
Launching `https://github.com/nf-core/taxprofiler` [desperate_cuvier] DSL2 - revision: 3d4eda2dbb [master]

------------------------------------------------------
                                        ,--./,-.
        ___     __   __   __   ___     /,-._.--~'
  |\ | |__  __ /  ` /  \ |__) |__         }  {
  | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                        `._,._,'
  nf-core/taxprofiler v1.1.2-g3d4eda2
------------------------------------------------------
Core Nextflow options
  revision                     : master
  runName                      : desperate_cuvier
  containerEngine              : singularity
  launchDir                    : /Volumes/IDGenomics_NAS/Bioinformatics/eriny/mosquito/2023-12-01
  workDir                      : /Volumes/IDGenomics_NAS/Bioinformatics/eriny/mosquito/2023-12-01/work
  projectDir                   : /home/eriny/.nextflow/assets/nf-core/taxprofiler
  userName                     : eriny
  profile                      : singularity
  configFiles                  : 

Input/output options
  input                        : sample_sheet.csv
  databases                    : database_all.txt
  outdir                       : taxprofiler

Preprocessing short-read QC options
  perform_shortread_qc         : true

Preprocessing host removal options
  perform_shortread_hostremoval: true
  hostremoval_reference        : reference.fasta
  save_hostremoval_bam         : true
  save_hostremoval_unmapped    : true

Profiling options
  run_centrifuge               : true
  run_kraken2                  : true
  ganon_report_rank            : default

!! Only displaying parameters that differ from the pipeline defaults !!
------------------------------------------------------
If you use nf-core/taxprofiler for your analysis please cite:

* The pipeline
  https://doi.org/10.5281/zenodo.7728364

* The nf-core framework
  https://doi.org/10.1038/s41587-020-0439-x

* Software dependencies
  https://github.com/nf-core/taxprofiler/blob/master/CITATIONS.md

Monitor the execution with Nextflow Tower using this URL: https://tower.nf/user/erin-olde/watch/5y7UeijszzHmnN
executor >  local (4)
[11/307bce] process > NFCORE_TAXPROFILER:TAXPROFILER:INPUT_CHECK:SAMPLESHEET_CHECK (sample_sheet.csv)             [100%] 1 of 1, cached: 1 ✔
[8e/78ff0d] process > NFCORE_TAXPROFILER:TAXPROFILER:DB_CHECK:UNTAR (k2_nt_20230502.tar.gz)                       [100%] 3 of 3, cached: 3 ✔
[d7/b16a2d] process > NFCORE_TAXPROFILER:TAXPROFILER:FASTQC (B016g)                                               [100%] 16 of 16, cached: 16 ✔
[-        ] process > NFCORE_TAXPROFILER:TAXPROFILER:SHORTREAD_PREPROCESSING:SHORTREAD_FASTP:FASTP_SINGLE         -
[c8/a8b135] process > NFCORE_TAXPROFILER:TAXPROFILER:SHORTREAD_PREPROCESSING:SHORTREAD_FASTP:FASTP_PAIRED (B013f) [100%] 16 of 16, cached: 16 ✔
[41/94fab5] process > NFCORE_TAXPROFILER:TAXPROFILER:SHORTREAD_PREPROCESSING:FASTQC_PROCESSED (B013g)             [100%] 16 of 16, cached: 16 ✔
[97/b9c3b6] process > NFCORE_TAXPROFILER:TAXPROFILER:SHORTREAD_HOSTREMOVAL:BOWTIE2_BUILD (reference.fasta)        [100%] 1 of 1, cached: 1 ✔
[47/54c4f2] process > NFCORE_TAXPROFILER:TAXPROFILER:SHORTREAD_HOSTREMOVAL:BOWTIE2_ALIGN (B015g)                  [100%] 16 of 16, cached: 16 ✔
[6d/fc97f5] process > NFCORE_TAXPROFILER:TAXPROFILER:SHORTREAD_HOSTREMOVAL:SAMTOOLS_INDEX (B016f)                 [100%] 16 of 16, cached: 16 ✔
[98/78ecba] process > NFCORE_TAXPROFILER:TAXPROFILER:SHORTREAD_HOSTREMOVAL:SAMTOOLS_STATS (B021f)                 [ 93%] 15 of 16, cached: 15
[b0/29c8c0] process > NFCORE_TAXPROFILER:TAXPROFILER:PROFILING:KRAKEN2_KRAKEN2 (nt|kraken2|B023f_pe)              [ 15%] 5 of 32, cached: 5
[60/1788d7] process > NFCORE_TAXPROFILER:TAXPROFILER:PROFILING:CENTRIFUGE_CENTRIFUGE (nt_2018|B013f_pe)           [ 12%] 2 of 16, cached: 2
[-        ] process > NFCORE_TAXPROFILER:TAXPROFILER:PROFILING:CENTRIFUGE_KREPORT                                 [  0%] 0 of 2
[-        ] process > NFCORE_TAXPROFILER:TAXPROFILER:CUSTOM_DUMPSOFTWAREVERSIONS                                  -
[-        ] process > NFCORE_TAXPROFILER:TAXPROFILER:MULTIQC                                                      -

Relevant files

No response

System information

Nextflow version : 23.10.0 build 5889 Hardware : Local Executor : local Container engine: Singularity OS : CentOS Version of nf-core/taxprofiler : current

Midnighter commented 7 months ago

This is not yet enough information to go on. The mkfifo failed points to one of the bash pipes failing, but I'm not certain yet. Can you paste the contents of /Volumes/IDGenomics_NAS/Bioinformatics/eriny/mosquito/2023-12-01/work/11/33708dd319d5e790a7d5b44e106865/.command.log, please? And in the case they differ, also .command.out and .command.err? Thanks in advance.

Midnighter commented 7 months ago

Something else that you can try is, when in that work directory, run bash .command.run and observe interactively what happens. You might also edit the .command.sh file and include a line before the centrifuge command:

echo $db_name

to see if something went wrong there.

erinyoung commented 7 months ago

Here is the contents of .command.log

$ cat /Volumes/IDGenomics_NAS/Bioinformatics/eriny/mosquito/2023-12-01/work/11/33708dd319d5e790a7d5b44e106865/.command.log
INFO:    Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred
INFO:    Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
(ERR): mkfifo(/tmp/72.inpipe1) failed.
Exiting now ...

The contents of .command.err look the same (to me) as .command.log

$ cat /Volumes/IDGenomics_NAS/Bioinformatics/eriny/mosquito/2023-12-01/work/11/33708dd319d5e790a7d5b44e106865/.command.err
INFO:    Environment variable SINGULARITYENV_NXF_TASK_WORKDIR is set, but APPTAINERENV_NXF_TASK_WORKDIR is preferred
INFO:    Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
(ERR): mkfifo(/tmp/72.inpipe1) failed.
Exiting now ...

The .command.out file is empty

$ cat /Volumes/IDGenomics_NAS/Bioinformatics/eriny/mosquito/2023-12-01/work/11/33708dd319d5e790a7d5b44e106865/.command.out
erinyoung commented 7 months ago

When I edited the .command.sh script to echo the $db_name variable, the result was nt_2018_3_3/nt.

Is this the correct value?

Midnighter commented 7 months ago

I'm a bit at a loss as to the source of the error. @jfy133 do you have any ideas?

Midnighter commented 7 months ago

Actually, there could be a problem with your harddrive where /tmp is mounted. You say the hardware is local, but is there anything special about it? Is it a network drive, for example? Or maybe it's simply running out of space? /tmp on many linux systems is actually "mounted in memory", so it might be more limited than you think.

jfy133 commented 7 months ago

I get this sometimes too on my laptop during testing

My suspicion is that somehow there is some conflict when there are multiple centrifuge jobs running at the same time.

I think on my laptop I set maxForks to 1 and that helped reduce a lot those errors (IIRC).

It's also been reported on the centrifuge repo I think but no response

We have a slack thread with @erinyoung somewhere where I proposed it.

Swindle98 commented 5 months ago

I'm getting this issue too. So far setting the centrifuge process to maxForks = 1 has reduced it, less than ideal solution though.

Swindle98 commented 5 months ago

I'm getting this issue too. So far setting the centrifuge process to maxForks = 1 has reduced it, less than ideal solution though.

Spoke too soon.

jfy133 commented 5 months ago

So I believe indeed it's something where two centrifuge runs on the same node send up trying to make the same pipe name, as per: https://stackoverflow.com/questions/13040021/mkfifo-error-error-creating-the-named-pipe-file-exists

So I think if we can set a unique /tmp/ directory (as @LilyAnderssonLee tried before, but something went iffy), maybe we can get this to work

jfy133 commented 5 months ago

My solution doesn't work currently because the --temp-directory option isn't exposed ot the suer for some reason...

Midnighter commented 5 months ago

Perhaps centrifuge will honor the TMPDIR environment variable? Then we could create a new temporary directory for each process invocation and set TMPDIR to that.

jfy133 commented 5 months ago

Nope, it's literally hardcoded in the software: https://github.com/DaehwanKimLab/centrifuge/blob/9e244583481cc273f74415f2a2418e8fd342ab17/centrifuge#L357

I've openned an issue to ask: https://github.com/DaehwanKimLab/centrifuge/issues/268