jhuapl-bio / taxtriage

TaxTriage is a Nextflow workflow designed to agnostically identify and classify microbial organisms within short- or long-read metagenomic NGS data. This flexible tool was developed with various use-cases of mNGS in mind.
MIT License
30 stars 5 forks source link

Pipeline fails at Porechop step #55

Closed poojasgupta closed 8 months ago

poojasgupta commented 9 months ago

Description of the bug

I am testing taxtriage on CLI with ONT metagenomics data and having issues with porechop. 4/5 samples were pre-processed (low complexity and quality filtering, adaptor clipping) using another pipeline and I left one sample unprocessed to test with taxtriage. The pipeline fails at the porechop step for the unprocessed sample (Sample65). The pipeline works fine when trim is set to "FALSE" for all samples.

Command used and terminal output

Nextflow command: 
 run jhuapl-bio/taxtriage --input samplesheet.csv --outdir taxtriage_out2 -profile docker --db /Volumes/IDGenomics_NAS/Data/kraken2_plus_pf16gb/k2_pluspf_16gb_20231009 -r main -latest --reference_assembly --remove_taxids \'9606\' -resume

Samplesheet:
sample,platform,fastq_1,fastq_2,sequencing_summary,trim
Sample14_UT-P2S01293-240126,OXFORD,taxprofiler/taxprofiler_out/analysis_ready_fastqs/Sample14_UT-P2S01293-240126_run1_filtered.fastq.gz,,,TRUE
Sample15_UT-P2S01293-240126,OXFORD,taxprofiler/taxprofiler_out/analysis_ready_fastqs/Sample15_UT-P2S01293-240126_run1_filtered.fastq.gz,,,TRUE
Sample59_UT-P2S01293-240126,OXFORD,taxprofiler/taxprofiler_out/analysis_ready_fastqs/Sample59_UT-P2S01293-240126_run1_filtered.fastq.gz,,,TRUE
Sample62_UT-P2S01293-240126,OXFORD,taxprofiler/taxprofiler_out/analysis_ready_fastqs/Sample62_UT-P2S01293-240126_run1_filtered.fastq.gz,,,TRUE
Sample65_UT-P2S01293-240126,OXFORD,combined/Sample65_UT-P2S01293-240126.fastq.gz,,,TRUE

Error:
Error executing process > 'NFCORE_TAXTRIAGE:TAXTRIAGE:PORECHOP (Sample65_UT-P2S01293-240126)'

Caused by:
  Process `NFCORE_TAXTRIAGE:TAXTRIAGE:PORECHOP (Sample65_UT-P2S01293-240126)` terminated with an error exit status (137)

Command executed:

  porechop \
      -i Sample65_UT-P2S01293-240126.fastq.gz \
      -t 12 \
       \
      -o Sample65_UT-P2S01293-240126.fastq.gz

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_TAXTRIAGE:TAXTRIAGE:PORECHOP":
      porechop: $( porechop --version )
  END_VERSIONS

Command exit status:
  137

Command output:

  [1m[4mLoading reads[0m
  Sample65_UT-P2S01293-240126.fastq.gz

Command error:

  [1m[4mLoading reads[0m
  Sample65_UT-P2S01293-240126.fastq.gz
  .command.sh: line 6:    29 Killed                  porechop -i Sample65_UT-P2S01293-240126.fastq.gz -t 12 -o Sample65_UT-P2S01293-240126.fastq.gz

Work dir:
  /Volumes/BioNGS_1/UT-P2S01293-240126/UT-P2S01293-240126/20240126_2026_P2S-01293-A_PAS21742_bded1596/work/ea/e089eaf142e00437c60686780fa8de

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

Relevant files

No response

System information

Nextflow version (version 23.04.1)

Merritt-Brian commented 8 months ago

Looks to be a hardware (likely RAM availability) issue - what is the current file size of the fastq file being used for each sample on average? Since you're running this on an HPC I would make sure that your system has sufficient RAM to perform this step as porechop is VERY RAM hungry

Merritt-Brian commented 8 months ago

I've added a "retry" feature to incrementally increase the requested RAM up to 3 times for Porechop up to 108 GB to "main" can you reattempt with your data?

poojasgupta commented 8 months ago

Sure, I will give it a try and see if it works. Thank you!

Merritt-Brian commented 8 months ago

@poojasgupta did the changes fix the issue with porechop for you?

poojasgupta commented 8 months ago

@Merritt-Brian I tried the updated pipeline with my data and still it fails at the porechop step. The error isn't very clear to me.

Workflow execution completed unsuccessfully!
The exit status of the task that caused the workflow execution to fail was: 0.
The full error message was:
Error executing process > 'NFCORE_TAXTRIAGE:TAXTRIAGE:PORECHOP (Sample65_UT-P2S01293-240126)'

Caused by:
  Missing output file(s) `*.fastq.gz` expected by process `NFCORE_TAXTRIAGE:TAXTRIAGE:PORECHOP (Sample65_UT-P2S01293-240126)` (note: input files are not included in the default matching set)

Command executed:

  porechop \
      -i Sample65_UT-P2S01293-240126.fastq.gz \
      -t 12 \
       \
      -o Sample65_UT-P2S01293-240126.fastq.gz

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_TAXTRIAGE:TAXTRIAGE:PORECHOP":
      porechop: $( porechop --version )
  END_VERSIONS

Command exit status:
  0

Command output:
    Barcode 56 (forward)                      79.2       79.2
    Barcode 57 (forward)                      79.2       76.0
    Barcode 58 (forward)                      77.8       76.9
    Barcode 59 (forward)                      77.8       76.0
    Barcode 60 (forward)                      79.2       80.0
    Barcode 61 (forward)                      76.0       76.9
    Barcode 62 (forward)                      80.0       80.0
    Barcode 63 (forward)                      79.2       79.2
    Barcode 64 (forward)                      79.2       79.2
    Barcode 65 (forward)                      80.0       80.0
    Barcode 66 (forward)                      76.0       79.2
    Barcode 67 (forward)                      80.0       84.0
    Barcode 68 (forward)                      80.0       76.0
    Barcode 69 (forward)                      76.9       76.9
    Barcode 70 (forward)                      79.2       79.2
    Barcode 71 (forward)                      79.2       79.2
    Barcode 72 (forward)                      80.0       77.8
    Barcode 73 (forward)                      76.9       80.0
    Barcode 74 (forward)                      79.2       77.8
    Barcode 75 (forward)                      79.2       76.9
    Barcode 76 (forward)                      79.2       76.9
    Barcode 77 (forward)                      76.9       84.0
    Barcode 78 (forward)                      80.8       80.8
    Barcode 79 (forward)                      80.8       79.2
    Barcode 80 (forward)                      76.9       75.0
    Barcode 81 (forward)                      76.9       79.2
    Barcode 82 (forward)                      77.8       76.9
    Barcode 83 (forward)                      80.0       76.9
    Barcode 84 (forward)                      76.0       76.0
    Barcode 85 (forward)                      83.3       79.2
    Barcode 86 (forward)                      76.9       80.0
    Barcode 87 (forward)                      76.9       76.9
    Barcode 88 (forward)                      75.0       76.9
    Barcode 89 (forward)                      80.0       80.0
    Barcode 90 (forward)                      79.2       76.0
    Barcode 91 (forward)                      75.9       79.2
    Barcode 92 (forward)                      76.0       77.8
    Barcode 93 (forward)                      79.2       79.2
    Barcode 94 (forward)                      79.2       76.0
    Barcode 95 (forward)                      83.3       76.9
    Barcode 96 (forward)                      76.0       76.9

  No adapters found - output reads are unchanged from input reads

  Saving trimmed reads to file
  pigz not found - using gzip to compress

  Saved result to Sample65_UT-P2S01293-240126.fastq.gz

Work dir:
  /Volumes/BioNGS_1/UT-P2S01293-240126/UT-P2S01293-240126/20240126_2026_P2S-01293-A_PAS21742_bded1596/work/68/39e44c153c3282f7fb9e5240c7ba1d

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file

My nextflow command:

nextflow run jhuapl-bio/taxtriage --input samplesheet.csv --outdir taxtriage_out2 -profile docker --db /Volumes/IDGenomics_NAS/Data/kraken2_plus_pf16gb/k2_pluspf_16gb_20231009 -r main -latest --remove_taxids \'9606\' --max_memory 128GB --max_cpus 16 --denovo_assembly

I had tried previously without --max_memory and --max_cpus flags but still got the same error.

Merritt-Brian commented 8 months ago

Doesn't look like it is a hardware issue anymore.

The full error message was:
Error executing process > 'NFCORE_TAXTRIAGE:TAXTRIAGE:PORECHOP (Sample65_UT-P2S01293-240126)'

Caused by:
  Missing output file(s) `*.fastq.gz` expected by process `NFCORE_TAXTRIAGE:TAXTRIAGE:PORECHOP (Sample65_UT-P2S01293-240126)` (note: input files are not included in the default matching set)

Seems to indicate that nothing contains the specific barcodes

A limitation of porechop for nextflow (default module) is that if there is no difference between the initial fastq file and the porechop output, it will not emit a new file and skip. Since the pipeline doesn't detect a file as output (as the command just exits) it fails. I can see about overriding the default porechop module. The other solution would be to just disable porechop for that specific sample

poojasgupta commented 8 months ago

Does porechop modify the orginal fastq file instead of creating a new "processed" file as an output? For the current run, I can disable porechop but would like to still run it for future runs as a pre-processing/quality control tool.

Merritt-Brian commented 8 months ago

It creates a new file, unfortunately, as is the nature of how nextflow works. The error is the result of a mandatory output (in this case a modified, new fastq file) for each of the samples. If there is nothing to trim/bin through porechop then it will fail and it is recommended to disable "trim" for any samples where that will occur.

poojasgupta commented 8 months ago

Oh okay! Thank you!

Merritt-Brian commented 8 months ago

I will close this for now and please let me know if the problem persists with porechop for a sample where it is expected to contain the necessary trimmable data. In the meantime, I will also make an issue to ignore failures and revert to use the original fastq file for the given sample.