Closed EricFournier3 closed 4 days ago
Hi @EricFournier3 thanks for the bug report! You're totally right that this shouldn't cause the pipeline to error out and the error should be ignored with the VADR output being optional in most cases.
In the meantime, you might be able to use a custom config to set errorStrategy = 'ignore'
:
process {
withName: "VADR_.*" {
errorStrategy = 'ignore'
}
}
Hope that helps while the issue is fixed in the workflow.
Hi @peterk87 , this time the pipeline failed on NF_FLU:ILLUMINA:BLAST_BLASTN_IRMA
BLAST engine error: Warning: Sequence contains no data Warning: Sequence contains no data Warning: Sequence contains no data Warning: Sequence contains no data Warning: Sequence contains no data Warning: Sequence contains no data Warning: Sequence contains no data Warning: Sequence contains no data
[b5/bf4bde] NOTE: Process `NF_FLU:ILLUMINA:VADR_IRMA (L00955952004)` terminated with an error exit status (2) -- Error is ignored
Error executing process > 'NF_FLU:ILLUMINA:BLAST_BLASTN_IRMA (L00955952004)'
Caused by:
Process `NF_FLU:ILLUMINA:BLAST_BLASTN_IRMA (L00955952004)` terminated with an error exit status (3)
Command executed:
DB=`find -L ./ -name "*.ndb" | sed 's/.ndb//'`
blastn \
-num_threads 8 \
-db $DB \
-query L00955952004.irma.consensus.fasta \
-outfmt "6 qaccver saccver pident length mismatch gapopen qstart qend sstart send evalue bitscore qlen slen qcovs stitle" -num_alignments 1000000 -evalue 1e-6 \
-out L00955952004.blastn.txt
cat <<-END_VERSIONS > versions.yml
"NF_FLU:ILLUMINA:BLAST_BLASTN_IRMA":
blast: $(blastn -version 2>&1 | sed 's/^.*blastn: //; s/ .*$//')
END_VERSIONS
Is there a way to revert to a previous version without VADR?
I tried with
nextflow pull nextflow pull CFIA-NCFAD/nf-flu -r b0d6575b6d
but it didn't works
Checking nextflow ...
WARN: Cannot read project manifest -- Cause: Remote resource not found: https://api.github.com/repos/nextflow-io/nextflow/contents/nextflow.config?ref=b0d6575b6d
Remote resource not found: https://api.github.com/repos/nextflow-io/nextflow/contents/main.nf?ref=b0d6575b6d
Thanks
I also notice on the GitHub front page that revisions are for nf-iav-illumina instead of nf-flu. I am a little confused about this
Hi @EricFournier3 is L00955952004_1
an empty sequence? I am trying to reproduce the issue and come up with a fix. If I provide an empty sequence (just a header), e.g.
>empty_seq
I get a similar issue.
$ v-annotate.pl --mkey flu -r --atgonly --xnocomp --nomisc --alt_fail extrant5,extrant3 --noseqnamemax --mdir vadr-model empty.fa empty
# v-annotate.pl :: classify and annotate sequences using a model library
# VADR 1.6.4 (Jun 2024)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# date: Wed Oct 16 14:51:05 2024
# $VADRBIOEASELDIR: /home/pkruczkiewicz/miniforge3/envs/vadr/bin
# $VADRBLASTDIR: /home/pkruczkiewicz/miniforge3/envs/vadr/bin
# $VADREASELDIR: /home/pkruczkiewicz/miniforge3/envs/vadr/bin
# $VADRINFERNALDIR: /home/pkruczkiewicz/miniforge3/envs/vadr/bin
# $VADRMODELDIR: /home/pkruczkiewicz/miniforge3/envs/vadr/share/vadr-1.6.4/vadr-models
# $VADRSCRIPTSDIR: /home/pkruczkiewicz/miniforge3/envs/vadr/share/vadr-1.6.4/vadr
#
# sequence file: empty.fa
# output directory: empty
# only consider ATG a valid start codon: yes [--atgonly]
# specify that alert codes in <s> cause FAILure: extrant5,extrant3 [--alt_fail]
# .cm, .minfo, blastn .fa files in $VADRMODELDIR start with key <s>, not 'vadr': flu [--mkey]
# model files are in directory <s>, not in $VADRMODELDIR: vadr-model [--mdir]
# in feature table for failed seqs, never change feature type to misc_feature: yes [--nomisc]
# turn off composition-based for blastx statistics with -comp_based_stats 0: yes [--xnocomp]
# replace stretches of Ns with expected nts, where possible: yes [-r]
# do not enforce a maximum length of 50 for sequence names (GenBank max): yes [--noseqnamemax]
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# Validating input ... Failed to fetch subseq: Requested start 1 isn't in the sequence empty at /home/pkruczkiewicz/miniforge3/envs/vadr/lib/perl5/5.32/site_perl/Bio/Easel/SqFile.pm line 713.
IRMA may be producing empty consensus FASTA files and those should be ignored for downstream processing.
I'm working on a patch release to try to address this issue.
To revert back to a previous version, you can specify the release tag from https://github.com/CFIA-NCFAD/nf-flu/releases
e.g. run the nf-flu version 3.3.10 before VADR was added
nextflow run CFIA-NCFAD/nf-flu -r 3.3.10 ...
The "official" repo is under the CFIA-NCFAD org at https://github.com/CFIA-NCFAD/nf-flu/
I forked when I should have transferred the repo, and I'm afraid of breaking things now by trying to transfer.
Sorry about the confusion! I'll try to make it more clear in the README.
great, thank you @peterk87 . I will revert to version 3.3.10 in the meantime
Hi @EricFournier3, this issue should be fixed in 3.5.2
nextflow pull CFIA-NCFAD/nf-flu
nextflow run CFIA-NCFAD/nf-flu -r 3.5.2 \
-c slurm.config \
--input $samplesheet.csv \
-profile singularity,slurm \
--platform illumina \
--outdir myoutdir
Let me know if the new release fixes your issue!
Hi @peterk87 ,
we now have this error with 3.5.2
N E X T F L O W ~ version 22.10.7
Launching `https://github.com/CFIA-NCFAD/nf-flu` [gloomy_bartik] DSL2 - revision: 10bb2e19cb [3.5.2]
Core Nextflow options
revision : 3.5.2
runName : gloomy_bartik
containerEngine : singularity
launchDir : /data/devel/nf-flu
workDir : /data/devel/nf-flu/work
projectDir : /data/devel/nf-flu/nextflow-home/.nextflow/assets/CFIA-NCFAD/nf-flu
userName : foueri01@inspq.qc.ca
profile : singularity,slurm
configFiles : /data/devel/nf-flu/nextflow-home/.nextflow/assets/CFIA-NCFAD/nf-flu/nextflow.config, /data/devel/nf-flu/scripts/slurm.config
Input/output options
input : /data/devel/nf-flu/samplesheet/samplesheet_run_nextseq_14.csv
platform : illumina
outdir : /data/devel/nf-flu/results/run_nextseq_14
IRMA assembly options
keep_ref_deletions : true
skip_irma_subtyping_report: true
Annotation options
vadr_model_targz : https://ftp.ncbi.nlm.nih.gov/pub/nawrocki/vadr-models/flu/1.6.3-2/vadr-models-flu-1.6.3-2.tar.gz
Max job request options
max_memory : 32 GB
[Only displaying parameters that differ from pipeline default]
------------------------------------------------------
------------------------------------------------------
executor > slurm (13), local (1)
[c2/3b1e9d] process > NF_FLU:ILLUMINA:CHECK_SAMPLE_SHEET (1) [100%] 1 of 1 ✔
[- ] process > NF_FLU:ILLUMINA:READ_COUNT_FAIL_TSV -
[ac/7ca4db] process > NF_FLU:ILLUMINA:READ_COUNT_PASS_TSV [100%] 1 of 1 ✔
[42/be1b92] process > NF_FLU:ILLUMINA:ZSTD_DECOMPRESS_FASTA [100%] 1 of 1 ✔
[75/ccc80b] process > NF_FLU:ILLUMINA:ZSTD_DECOMPRESS_CSV [100%] 1 of 1 ✔
[05/565036] process > NF_FLU:ILLUMINA:BLAST_MAKEBLASTDB_NCBI (41415330-influenza.fasta) [100%] 1 of 1 ✔
[1d/ba4bca] process > NF_FLU:ILLUMINA:SETUP_FLU_VADR_MODEL [100%] 1 of 1 ✔
[1d/c5914d] process > NF_FLU:ILLUMINA:CAT_ILLUMINA_FASTQ (L00955952004) [100%] 1 of 1 ✔
[65/c0d7d6] process > NF_FLU:ILLUMINA:IRMA (L00955952004) [100%] 1 of 1 ✔
[b9/90b024] process > NF_FLU:ILLUMINA:VADR_IRMA (L00955952004) [100%] 1 of 1, failed: 1 ✔
[- ] process > NF_FLU:ILLUMINA:VADR_SUMMARIZE_ISSUES_IRMA -
[- ] process > NF_FLU:ILLUMINA:PRE_TABLE2ASN_IRMA -
[- ] process > NF_FLU:ILLUMINA:TABLE2ASN_IRMA -
[- ] process > NF_FLU:ILLUMINA:POST_TABLE2ASN_IRMA -
[0a/9ac84e] process > NF_FLU:ILLUMINA:BLAST_BLASTN_IRMA (L00955952004) [100%] 1 of 1 ✔
[67/e16ac4] process > NF_FLU:ILLUMINA:SUBTYPING_REPORT_IRMA_CONSENSUS (1) [ 50%] 1 of 2, failed: 1, retries: 1
[5e/bfd8a0] process > NF_FLU:ILLUMINA:PULL_TOP_REF_ID (L00955952004) [ 50%] 1 of 2, failed: 1, retries: 1
[- ] process > NF_FLU:ILLUMINA:SEQTK_SEQ -
[- ] process > NF_FLU:ILLUMINA:MINIMAP2 -
[- ] process > NF_FLU:ILLUMINA:MOSDEPTH_GENOME -
[- ] process > NF_FLU:ILLUMINA:FREEBAYES -
[- ] process > NF_FLU:ILLUMINA:BCF_FILTER_FREEBAYES -
[- ] process > NF_FLU:ILLUMINA:VCF_FILTER_FRAMESHIFT -
[- ] process > NF_FLU:ILLUMINA:BCFTOOLS_STATS -
[- ] process > NF_FLU:ILLUMINA:COVERAGE_PLOT -
[- ] process > NF_FLU:ILLUMINA:BCF_CONSENSUS -
[- ] process > NF_FLU:ILLUMINA:CAT_CONSENSUS -
[- ] process > NF_FLU:ILLUMINA:VADR_BCFTOOLS -
[- ] process > NF_FLU:ILLUMINA:VADR_SUMMARIZE_ISSUES_BCFTOOLS -
[- ] process > NF_FLU:ILLUMINA:PRE_TABLE2ASN_BCFTOOLS -
[- ] process > NF_FLU:ILLUMINA:TABLE2ASN_BCFTOOLS -
[- ] process > NF_FLU:ILLUMINA:POST_TABLE2ASN_BCFTOOLS -
[- ] process > NF_FLU:ILLUMINA:BLAST_BLASTN_CONSENSUS -
[- ] process > NF_FLU:ILLUMINA:SUBTYPING_REPORT_BCF_CONSENSUS -
[- ] process > NF_FLU:ILLUMINA:MQC_VERSIONS_TABLE -
[- ] process > NF_FLU:ILLUMINA:MULTIQC -
[b9/90b024] NOTE: Process `NF_FLU:ILLUMINA:VADR_IRMA (L00955952004)` terminated with an error exit status (1) -- Error is ignored
[6c/b7fa83] NOTE: Process `NF_FLU:ILLUMINA:PULL_TOP_REF_ID (L00955952004)` terminated with an error exit status (1) -- Execution is retried (1)
[cc/9b43a8] NOTE: Process `NF_FLU:ILLUMINA:SUBTYPING_REPORT_IRMA_CONSENSUS (1)` terminated with an error exit status (1) -- Execution is retried (1)
Error executing process > 'NF_FLU:ILLUMINA:PULL_TOP_REF_ID (L00955952004)'
Caused by:
Process `NF_FLU:ILLUMINA:PULL_TOP_REF_ID (L00955952004)` terminated with an error exit status (1)
Command executed:
parse_influenza_blast_results.py \
--flu-metadata 41415333-influenza.csv \
--get-top-ref True \
--top 1 \
--pident-threshold 0.85 \
--sample-name L00955952004 \
L00955952004.blastn.txt
cat <<-END_VERSIONS > versions.yml
"NF_FLU:ILLUMINA:PULL_TOP_REF_ID":
python: $(python --version | sed 's/Python //g')
END_VERSIONS
Command exit status:
1
That's really strange. Are the IRMA consensus sequences completely empty? Do they look anomalous to you?
Would it be possible to share the input files in some way so I could do some more in-depth debugging?
I sent you a OneDrive link at Peter.Kruczkiewicz@inspection.gc.ca
Hi @EricFournier3 would you be able to try sending the link again? Your email might have been blocked by spam filters.
Hi @peterk87 , yes I just sent to you again
Is there an existing issue for this?
Description of the Bug/Issue
Hi,
would it be possible to add errorStrategy 'ignore' in the VADR process and produce the samples output as it was in the previous version (consensus, reports, images, etc). Because when this step fail for one sample, the whole pipeline stop. We are using nf-flu in our local in-house Influenza global pipeline. And when it fail this way, all other samples are not processed
For your info, this problematic sample (L00955952004) was perfectly processed with the previous version of nf-flu (without the VADR process)
Thanks
Nextflow command-line
Error Message
Workflow Version
3.5.1 revision: f18f8ce53d [master]
Nextflow Executor
slurm
Nextflow Version
version 22.10.7
Java Version
java version "17.0.8" 2023-07-18 LTS Java(TM) SE Runtime Environment (build 17.0.8+9-LTS-211) Java HotSpot(TM) 64-Bit Server VM (build 17.0.8+9-LTS-211, mixed mode, sharing)
Hardware
cluster
Operating System (OS)
CentOS Linux release 7.9.2009 (Core)
Conda/Container Engine
None
Additional context
nextflow.log