nf-core / eager

A fully reproducible and state-of-the-art ancient DNA analysis pipeline
https://nf-co.re/eager
MIT License
148 stars 82 forks source link

Retain read group information in bam merging steps #808

Closed TCLamnidis closed 2 years ago

TCLamnidis commented 2 years ago

Is your feature request related to a problem? Please describe

Currently, every bam merging step in nf-core eager will overwrite the read groups in the bam, thus discarding potentially useful information that would otherwise allow users to trace the origin of specific reads to a library/sequencing run. In some form this information may exist among the intermediate files, but it should not be discarded without cause.

This information can be important also for calling of genotype likelihoods (which is currently not done within eager, but might be a good future addition).

Describe the solution you'd like

Each bam merging step should return the union of read groups, instead of overwriting that information.

Additional context

The current behaviour is (I think) a fossil-feature leftover from EAGER, that had to do with how pathogen screening works and how GATK UG prefers its input bams.

I think tweaking the read groups produced during mapping would potentially kill two birds with one stone. Investigating this further.