nf-core / eager

A fully reproducible and state-of-the-art ancient DNA analysis pipeline
https://nf-co.re/eager
MIT License
129 stars 78 forks source link

DSL1: Add possibility to use MapDamage for damage estimation #1020

Closed TCLamnidis closed 8 months ago

TCLamnidis commented 11 months ago

Is your feature request related to a problem? Please describe

DamageProfiler does not have an option to limit the amount of reads used for damage estimation. With really large BAM files, the amount of time and memory required to get a damage estimate becomes unrealistic, when the estimate is likely quite accurate already from 10k reads onwards.

Describe the solution you'd like

It would be good to include MapDamage2 as an alternative option for damage calculation, since it includes an option to limit the number of reads used. Current implementation of MapDamage2 in nf-core/eager only runs if the user has requested BAM rescaling. Would be nice to have a quick way of getting damage plots without needing to do the time-consuming rescaling step. Results form the stat estimation can also be provided to the rescaling step to speed that up if a user decides to do both.

Describe alternatives you've considered

It is possible to use PMDtools as an alternative atm, but that is slower than either alternative, and can fail unexpectedly at times.

Additional context

https://github.com/Integrative-Transcriptomics/DamageProfiler/issues/59 https://github.com/Integrative-Transcriptomics/DamageProfiler/issues/58 Sadly, even if these were to be fixed, newer versions of Damageprofiler cannot be used because of Java requirement clashes with gatk 3.5, so fixing these issues there would only fix the behaviour of the pipeline from 3.0+