Closed poojasgupta closed 8 months ago
Looks to be a hardware (likely RAM availability) issue - what is the current file size of the fastq file being used for each sample on average? Since you're running this on an HPC I would make sure that your system has sufficient RAM to perform this step as porechop is VERY RAM hungry
I've added a "retry" feature to incrementally increase the requested RAM up to 3 times for Porechop up to 108 GB to "main" can you reattempt with your data?
Sure, I will give it a try and see if it works. Thank you!
@poojasgupta did the changes fix the issue with porechop for you?
@Merritt-Brian I tried the updated pipeline with my data and still it fails at the porechop step. The error isn't very clear to me.
Workflow execution completed unsuccessfully!
The exit status of the task that caused the workflow execution to fail was: 0.
The full error message was:
Error executing process > 'NFCORE_TAXTRIAGE:TAXTRIAGE:PORECHOP (Sample65_UT-P2S01293-240126)'
Caused by:
Missing output file(s) `*.fastq.gz` expected by process `NFCORE_TAXTRIAGE:TAXTRIAGE:PORECHOP (Sample65_UT-P2S01293-240126)` (note: input files are not included in the default matching set)
Command executed:
porechop \
-i Sample65_UT-P2S01293-240126.fastq.gz \
-t 12 \
\
-o Sample65_UT-P2S01293-240126.fastq.gz
cat <<-END_VERSIONS > versions.yml
"NFCORE_TAXTRIAGE:TAXTRIAGE:PORECHOP":
porechop: $( porechop --version )
END_VERSIONS
Command exit status:
0
Command output:
Barcode 56 (forward) 79.2 79.2
Barcode 57 (forward) 79.2 76.0
Barcode 58 (forward) 77.8 76.9
Barcode 59 (forward) 77.8 76.0
Barcode 60 (forward) 79.2 80.0
Barcode 61 (forward) 76.0 76.9
Barcode 62 (forward) 80.0 80.0
Barcode 63 (forward) 79.2 79.2
Barcode 64 (forward) 79.2 79.2
Barcode 65 (forward) 80.0 80.0
Barcode 66 (forward) 76.0 79.2
Barcode 67 (forward) 80.0 84.0
Barcode 68 (forward) 80.0 76.0
Barcode 69 (forward) 76.9 76.9
Barcode 70 (forward) 79.2 79.2
Barcode 71 (forward) 79.2 79.2
Barcode 72 (forward) 80.0 77.8
Barcode 73 (forward) 76.9 80.0
Barcode 74 (forward) 79.2 77.8
Barcode 75 (forward) 79.2 76.9
Barcode 76 (forward) 79.2 76.9
Barcode 77 (forward) 76.9 84.0
Barcode 78 (forward) 80.8 80.8
Barcode 79 (forward) 80.8 79.2
Barcode 80 (forward) 76.9 75.0
Barcode 81 (forward) 76.9 79.2
Barcode 82 (forward) 77.8 76.9
Barcode 83 (forward) 80.0 76.9
Barcode 84 (forward) 76.0 76.0
Barcode 85 (forward) 83.3 79.2
Barcode 86 (forward) 76.9 80.0
Barcode 87 (forward) 76.9 76.9
Barcode 88 (forward) 75.0 76.9
Barcode 89 (forward) 80.0 80.0
Barcode 90 (forward) 79.2 76.0
Barcode 91 (forward) 75.9 79.2
Barcode 92 (forward) 76.0 77.8
Barcode 93 (forward) 79.2 79.2
Barcode 94 (forward) 79.2 76.0
Barcode 95 (forward) 83.3 76.9
Barcode 96 (forward) 76.0 76.9
No adapters found - output reads are unchanged from input reads
[1m[4mSaving trimmed reads to file[0m
pigz not found - using gzip to compress
Saved result to Sample65_UT-P2S01293-240126.fastq.gz
Work dir:
/Volumes/BioNGS_1/UT-P2S01293-240126/UT-P2S01293-240126/20240126_2026_P2S-01293-A_PAS21742_bded1596/work/68/39e44c153c3282f7fb9e5240c7ba1d
Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file
My nextflow command:
nextflow run jhuapl-bio/taxtriage --input samplesheet.csv --outdir taxtriage_out2 -profile docker --db /Volumes/IDGenomics_NAS/Data/kraken2_plus_pf16gb/k2_pluspf_16gb_20231009 -r main -latest --remove_taxids \'9606\' --max_memory 128GB --max_cpus 16 --denovo_assembly
I had tried previously without --max_memory and --max_cpus
flags but still got the same error.
Doesn't look like it is a hardware issue anymore.
The full error message was:
Error executing process > 'NFCORE_TAXTRIAGE:TAXTRIAGE:PORECHOP (Sample65_UT-P2S01293-240126)'
Caused by:
Missing output file(s) `*.fastq.gz` expected by process `NFCORE_TAXTRIAGE:TAXTRIAGE:PORECHOP (Sample65_UT-P2S01293-240126)` (note: input files are not included in the default matching set)
Seems to indicate that nothing contains the specific barcodes
A limitation of porechop for nextflow (default module) is that if there is no difference between the initial fastq file and the porechop output, it will not emit a new file and skip. Since the pipeline doesn't detect a file as output (as the command just exits) it fails. I can see about overriding the default porechop module. The other solution would be to just disable porechop for that specific sample
Does porechop modify the orginal fastq file instead of creating a new "processed" file as an output? For the current run, I can disable porechop but would like to still run it for future runs as a pre-processing/quality control tool.
It creates a new file, unfortunately, as is the nature of how nextflow works. The error is the result of a mandatory output (in this case a modified, new fastq file) for each of the samples. If there is nothing to trim/bin through porechop then it will fail and it is recommended to disable "trim" for any samples where that will occur.
Oh okay! Thank you!
I will close this for now and please let me know if the problem persists with porechop for a sample where it is expected to contain the necessary trimmable data. In the meantime, I will also make an issue to ignore failures and revert to use the original fastq file for the given sample.
Description of the bug
I am testing taxtriage on CLI with ONT metagenomics data and having issues with porechop. 4/5 samples were pre-processed (low complexity and quality filtering, adaptor clipping) using another pipeline and I left one sample unprocessed to test with taxtriage. The pipeline fails at the porechop step for the unprocessed sample (Sample65). The pipeline works fine when trim is set to "FALSE" for all samples.
Command used and terminal output
Relevant files
No response
System information
Nextflow version (version 23.04.1)