Open HarryHung opened 2 months ago
Which executor are you using?
Same issue on both local and LSF.
if you cd
in the task work directory and execute bash .command.run
is the same issue reported?
The same issue does NOT occur when I execute bash .command.run
within the task work directory.
(further investigation below the error and output)
This is the original error
ERROR ~ Error executing process > 'GBS_RES:srst2_for_res_typing (1)'
Caused by:
Missing output file(s) `test*.bam` expected by process `GBS_RES:srst2_for_res_typing (1)`
Command executed:
srst2 --samtools_args '\-A' --input_pe test_1.fastq.gz test_2.fastq.gz --output test --log --save_scores --min_coverage 99.9 --max_divergence 5 --gene_db GBS_Res_Gene-DB_Final.fasta
touch test__fullgenes__GBS_Res_Gene-DB_Final__results.txt
Command exit status:
0
Command output:
bucket 7: 30%
bucket 7: 40%
bucket 7: 50%
bucket 7: 60%
bucket 7: 70%
bucket 7: 80%
bucket 7: 90%
bucket 7: 100%
Sorting block of length 114 for bucket 7
(Using difference cover)
Sorting block time: 00:00:00
Returning block of 115 for bucket 7
Exited Ebwt loop
fchr[A]: 0
fchr[C]: 444
fchr[G]: 715
fchr[T]: 1040
fchr[$]: 1463
Exiting Ebwt::buildToDisk()
Returning from initFromVector
Wrote 4195680 bytes to primary EBWT file: GBS_Res_Gene-DB_Final.fasta.rev.1.bt2
Wrote 372 bytes to secondary EBWT file: GBS_Res_Gene-DB_Final.fasta.rev.2.bt2
Re-opening _in1 and _in2 as input streams
Returning from Ebwt constructor
Headers:
len: 1463
bwtLen: 1464
sz: 366
bwtSz: 366
lineRate: 6
offRate: 4
offMask: 0xfffffff0
ftabChars: 10
eftabLen: 20
eftabSz: 80
ftabLen: 1048577
ftabSz: 4194308
offsLen: 92
offsSz: 368
lineSz: 64
sideSz: 64
sideBwtSz: 48
sideBwtLen: 192
numSides: 8
numLines: 8
ebwtTotLen: 512
ebwtTotSz: 512
color: 0
reverse: 1
Total time for backward call to driver() for mirror index: 00:00:00
Command error:
WARNING: DEPRECATED USAGE: Forwarding SINGULARITYENV_NXF_TASK_WORKDIR as environment variable will not be supported in the future, use APPTAINERENV_NXF_TASK_WORKDIR instead
Building a SMALL index
(ERR): mkfifo(/tmp/62.inpipe1) failed.
Exiting now ...
Work dir:
/home/ubuntu/local-repo/GBS-Typer-sanger-nf/work/c4/55eacec7ba5627ef369b11d433e025
Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`
-- Check '.nextflow.log' file for details
This is the bash .command.run
output
❯ cd /home/ubuntu/local-repo/GBS-Typer-sanger-nf/work/c4/55eacec7ba5627ef369b11d433e025
❯ bash .command.run
WARNING: DEPRECATED USAGE: Forwarding SINGULARITYENV_NXF_TASK_WORKDIR as environment variable will not be supported in the future, use APPTAINERENV_NXF_TASK_WORKDIR instead
1224227 reads; of these:
1224227 (100.00%) were paired; of these:
1224187 (100.00%) aligned concordantly 0 times
40 (0.00%) aligned concordantly exactly 1 time
0 (0.00%) aligned concordantly >1 times
----
1224187 pairs aligned concordantly 0 times; of these:
174 (0.01%) aligned discordantly 1 time
----
1224013 pairs aligned 0 times concordantly or discordantly; of these:
2448026 mates make up the pairs; of these:
2444553 (99.86%) aligned 0 times
3473 (0.14%) aligned exactly 1 time
0 (0.00%) aligned >1 times
0.16% overall alignment rate
[samopen] SAM header is present: 19 sequences.
[mpileup] 1 samples in 1 input files
<mpileup> Set max per-file depth to 8000
Additional observiations:
I notice that if I use the bash .command.run
generated by Nextflow 22.10.7, the FIFO files under /tmp
are always have a 4 - 5 digits name, while bash .command.run
generated by Nextflow 24.04.4 generated FIFO files with 2 digits name.
My previous assumption that the error is caused by multiple processes seems to be incorrect, as the same error still happen with executor.queueSize = 1
in the nextflow.config
. But somehow singularity.runOptions = '-B $(mktemp -d):/tmp'
can avoid this error. Maybe the issue is not concurrent process namespace conflict, but later processes are somehow unaware of the existing content in /tmp
?
I am not sure what is happening, and please let me know if you need more information.
You can test it out by cloning https://github.com/sanger-bentley-group/GBS-Typer-sanger-nf.git
(please test commit d98cb52
, as the workaround might be in place for later commits/versions), run (with singularity installed)nextflow run main.nf --reads 'tests/regression_test_data/input_data/*_{1,2}.fastq.gz' --results_dir output -profile sanger
to reproduce the error .
Very to help without providing a test case to replicate the issue
Hi @pditommaso , the last bit of my latest message contains a test case. Thanks!
You can test it out by cloning https://github.com/sanger-bentley-group/GBS-Typer-sanger-nf.git (please test commit d98cb52, as the workaround might be in place for later commits/versions), run (with singularity installed)nextflow run main.nf --reads 'tests/regression_test_data/inputdata/*{1,2}.fastq.gz' --results_dir output -profile sanger to reproduce the error .
I cannot pull the full pipeline execution. I need self-contained test case running a single task running this error using the local executor and slurm
Sure, I have put together a minimal test case to allow you repliacting the issue with a local executor. It should complete without error with Nextflow 22.10.7, but fail with Nextflow 24.04.4.
You will still need to grab a few essential input data files, by downloading the data directory of this self-contained test case: https://drive.google.com/drive/folders/1XWpx8mU9hQHuCQ6zE-7JOdGsxxABiAqE?usp=sharing)
main.nf
process srst2_for_gbs_res_typing {
input:
tuple val(pair_id), file(reads) // ID and paired read files
path db // File of resistance database file
output:
tuple val(pair_id), file("${pair_id}*.bam"), emit: bam_files
script:
"""
srst2 --samtools_args '\\-A' --input_pe ${reads[0]} ${reads[1]} --output ${pair_id} --log --save_scores --min_coverage 99.9 --max_divergence 5 --gene_db ${db}
"""
}
process srst2_for_other_res_typing {
input:
tuple val(pair_id), file(reads) // ID and paired read files
path db // File of resistance database file
output:
tuple val(pair_id), file("${pair_id}*.bam"), emit: bam_files
script:
"""
srst2 --samtools_args '\\-A' --input_pe ${reads[0]} ${reads[1]} --output ${pair_id} --log --save_scores --min_coverage 70 --max_divergence 30 --gene_db ${db}
"""
}
workflow {
Channel.fromFilePairs( 'data/*_{1,2}.fastq.gz', checkIfExists: true )
.set { read_pairs_ch }
gbs_res_typer_db = channel.fromPath('data/GBS_Res_Gene-DB_Final.fasta', checkIfExists: true)
other_res_db = channel.fromPath('data/ResFinder.fasta', checkIfExists: true)
srst2_for_gbs_res_typing(read_pairs_ch, gbs_res_typer_db)
srst2_for_other_res_typing(read_pairs_ch, other_res_db)
}
nextflow.config
process.container = 'bluemoon222/gbs-typer-sanger-nf:0.0.7'
singularity {
enabled = true
autoMounts = true
cacheDir = "$PWD"
}
Bug report
When users try to run this pipeline with Singularity, it works on Nextflow 22.10.7 and before, but fails on Nextflow 23.04.0 and later (including the latest release).
The failure happens when srst2 within the container try to run Bowtie2, where Bowtie2 attempts to create FIFO files under
/tmp
viamkfifo
. As Singularity mounts/tmp
by default, when multiple processes is running srst2, all of them will write their FIFO files to the host/tmp
directory.In Nextflow 22.10.7 and before, the FIFO files have longer file names, e.g.
124813.inpipe1
,124813.inpipe2
,124874.inpipe1
,124874.inpipe2
,124964.inpipe1
,124964.inpipe2
, and all is well.However, in Nextflow 23.04.0 and later, the FIFO files have much shorter file names, e.g.
61.inpipe1
,61.inpipe2
,62.inpipe1
,62.inpipe2
,63.inpipe1
,63.inpipe2
. And the relevant processes soon crash due to what I think is namespace conflict, and the error looks like this:The only thing changed between my tests is the Nextflow executable version, nothing else. I compared the
.command.run
and.command.sh
between runs, they all look identical (except the work dir paths).At this point, I am wondering is there some hidden environment variables changed between these Nextflow versions that would affect the behaviours of Singularity?
This seems to be a related issue: https://github.com/nf-core/taxprofiler/issues/422
I am able to work around the issue by forcing each container to use a different subdirectory in
/tmp
by addingto
nextflow.config
Expected behavior and actual behavior
Expected: Singularity should always behave the same regardless of Nextflow version.
Actual: Latest Nextflow introduce a new bug.
Steps to reproduce the problem
Run the pipeline with
-profile sanger
Program output
N/A
Environment
Additional context
N/A