magicDGS / ReadTools

A Universal Toolkit for Handling Sequence Data from Different Sequencing Platforms
https://magicdgs.github.io/ReadTools/
MIT License
6 stars 3 forks source link

DownloadDistmapResult: merge SAM header (feature request) #511

Closed robmaz closed 5 years ago

robmaz commented 5 years ago

Following up on the idea of a bam-based pipeline, it would be super useful if you could merge a saved SAM header (in particular the RG info) back into the generated bam.

magicDGS commented 5 years ago

Thanks for the proposal: this one is a bit more complicated. I need to know some specific requirements and possible difficulties in the implementation:

  1. How the read groups are assigned to the reads? There is no information retained in distmap about that (ony the barcode - and that means making again the matching for them).
  2. Should the header be merged with the one comming from mapping, or just overriden completely?
  3. In case of overriden, what will happen if there are conflicts with previos header lines? For example, if remapping with distmap a previous mapped file.
robmaz commented 5 years ago

What I originally thought was basically to put back in the previously stored RG info from the header, assigning all reads to the same group; I realize that as a generic problem, this is not so straight-forward. The old PG info could be restored assuming that all reads were processed in the same way. I guess remapping should replace the SQ lines with the new ones, but retain a PG line for the previous mapping? I would say that it cannot be readtools duty to figure out ambiguities, one has to assume that person who wants the header merged knows what they are doing. If there are two RGs, for example, just bail out with an error message.

So the assumption should be that this is a header that can unambiguously added and is a single PG step (the mapping) away from the current header. And just fail with a corresponding error message if that does not seem to be true.

That means:

magicDGS commented 5 years ago

I linked into the new issue your comment to conserve your mind idea. But I close in favor of #518 to keep the conversation only in one place.