Closed robmaz closed 5 years ago
Thanks for the proposal: this one is a bit more complicated. I need to know some specific requirements and possible difficulties in the implementation:
What I originally thought was basically to put back in the previously stored RG info from the header, assigning all reads to the same group; I realize that as a generic problem, this is not so straight-forward. The old PG info could be restored assuming that all reads were processed in the same way. I guess remapping should replace the SQ lines with the new ones, but retain a PG line for the previous mapping? I would say that it cannot be readtools duty to figure out ambiguities, one has to assume that person who wants the header merged knows what they are doing. If there are two RGs, for example, just bail out with an error message.
So the assumption should be that this is a header that can unambiguously added and is a single PG step (the mapping) away from the current header. And just fail with a corresponding error message if that does not seem to be true.
That means:
I linked into the new issue your comment to conserve your mind idea. But I close in favor of #518 to keep the conversation only in one place.
Following up on the idea of a bam-based pipeline, it would be super useful if you could merge a saved SAM header (in particular the RG info) back into the generated bam.