Chunking and stitching to parallelise alignment

nf-core / eager

A fully reproducible and state-of-the-art ancient DNA analysis pipeline

https://nf-co.re/eager

MIT License

148 stars 82 forks source link

Chunking and stitching to parallelise alignment #797

Closed pontussk closed 7 months ago

pontussk commented 3 years ago

It would be very useful to have automated splitting of large fastq files to parallelise alignment, and then have automated concatenation of the output aligned files. BWA alignment is often the most time-consuming step of processing.

This could speed up processing of e.g. mammalian ancient genome sequencing which can be 300-4000 million reads from a single library.

jfy133 commented 3 years ago

This would be good for a centralised nf-core module. One for the upcoming hackathon!

yassineS commented 3 years ago

FYI, this could be a great option: https://github.com/bigdatagenomics/cannoli/issues/323 and it works out of the box with singularity and docker

jfy133 commented 7 months ago

This is already done with sharding from @shyama-mama !

https://github.com/nf-core/eager/pull/1023