nf-core / eager

A fully reproducible and state-of-the-art ancient DNA analysis pipeline
https://nf-co.re/eager
MIT License
129 stars 78 forks source link

DSL2: Add sharding of FASTQs before alignment #1006

Closed shyama-mama closed 8 months ago

shyama-mama commented 1 year ago

Alignment for ancient samples take a long time due to misincorporations from DNA damage and various contaminating sequences. Some environments may have a time limit for how long a task can run. Samples with a lot of reads could go over the limit, sometimes taking more than 72 hours to finish alignment. Although increasing the number of threads used for alignment can speed this up, very large samples can still take a long time. So a more scaleable solution is needed.

One solution is to split large fastqs into smaller chunks, align them in parallel and then merge them in the end before post-processing. In essence, artificially increasing the cores used to align the sample at the same time.