nf-core / eager

A fully reproducible and state-of-the-art ancient DNA analysis pipeline
https://nf-co.re/eager
MIT License
148 stars 81 forks source link

Entire pipeline crashes at DamageProfiler if no reads present #171

Closed jfy133 closed 5 years ago

jfy133 commented 5 years ago

Describe the bug

I was running an EAGER 2.0.6 run with both samples and blanks.

One of the blanks had no reads hitting to the reference genome, and at damageprofiler the pipeline crashed because a .json file wasn't found, as it wasn't created by damageprofiler.

EAGER error

Mar-08 11:41:46.829 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'damageprofiler (R3Box_S0_L002_R1_001.sorted)'

Caused by:
  Missing output file(s) `*/*.json` expected by process `damageprofiler (R3Box_S0_L002_R1_001.sorted)`

Command executed:

  damageprofiler -i R3Box_S0_L002_R1_001.sorted.bam -r Porphyromonas_gingivalis_ATCC_33277.fasta -l 100 -t 15 -o .

Command exit status:
  0

Command output:
  DamageProfiler v0.4.4

Work dir:
  /projects1/microbiome_calculus/RIII/04-analysis/redcomplex_mapping/output/Porphyronomas_gingivalis/work/c3/c9d4155d94032c9df3879e5ef7e7d7

damageprofiler error (last few lines)

2019-03-08 11:41:37 INFO  StartCalculations:108 - 0 Reads processed. 
2019-03-08 11:41:37 INFO  StartCalculations:108 - 0 Reads processed.
2019-03-08 11:41:37 INFO  StartCalculations:952 - Values normalized ....
2019-03-08 11:41:37 INFO  StartCalculations:126 - -------------------
2019-03-08 11:41:37 INFO  StartCalculations:127 - # reads used for damage calculation: 0
2019-03-08 11:41:37 WARN  StartCalculations:333 - No reads processed. Can't create any output
2019-03-08 11:41:37 INFO  StartCalculations:276 - Runtime of Module was: 0 seconds.

To Reproduce

Run the pipeline with a sample that doesn't map to the reference.

Expected behavior

As damageprofiler isn't a crucial step for downstream steps, allow the pipeline to continue for other samples.

apeltzer commented 5 years ago

Yeah, we could simply add an error ignore in the base.config to accomodate this. Or make the json output an optional output - does it write out the other files though?

jfy133 commented 5 years ago

No, just the DamageProfiler log. Nothing else was in the corresponding work/ directory

apeltzer commented 5 years ago

I guess we can have the error ignore thing then in the next release ;-)

jfy133 commented 5 years ago

DamageProfiler fix appears to work in https://github.com/nf-core/eager/pull/172/commits/c3a71e8b4991673e40e231746132a6b8d8dd7170, but basically the same issue occurs with Qualimap

Can the same fix be applied?

Command error: 
  Failed to run bamqc 
  java.lang.RuntimeException: The BAM file is empty or corrupt 
        at org.bioinfo.ngs.qc.qualimap.process.BamStatsAnalysis.run(BamStatsAnalysis.java:529) 
        at org.bioinfo.ngs.qc.qualimap.main.BamQcTool.execute(BamQcTool.java:242) 
        at org.bioinfo.ngs.qc.qualimap.main.NgsSmartTool.run(NgsSmartTool.java:190) 
        at org.bioinfo.ngs.qc.qualimap.main.NgsSmartMain.main(NgsSmartMain.java:113) 

Work dir: 
  /projects1/microbiome_calculus/RIII/04-analysis/redcomplex_mapping/output/work/db/a2c3c58bc3524406c076b30dee6bfa 
apeltzer commented 5 years ago

Yeah fair point - actually the pipelin shouldn't fail if the output is just valid but empty...

jfy133 commented 5 years ago

The difference with Damageprofiler is that Qualimap this actually reports this as an error - can you get around that?

apeltzer commented 5 years ago

I'm wondering whether this is possible in general - maybe!

apeltzer commented 5 years ago

This was addressed in #172

jfy133 commented 5 years ago

running test now, did this fix the qualimap issue too? Or should I set that as another issue?

apeltzer commented 5 years ago

I did add a fix for qualimap too - both tools don't behave well with corrupt or empty data...

jfy133 commented 5 years ago

And indeed it is working through! Thanks!