hall-lab / svtyper

Bayesian genotyper for structural variants
MIT License
126 stars 55 forks source link

Support MC flag as used by biobambam #15

Closed chapmanb closed 9 years ago

chapmanb commented 9 years ago

Colby; Congrats on the SpeedSeq paper and thanks for all the great work on lumpyexpress and svtyper. This fixes an issue with biobambam produced BAMs. Let me know if you have any questions or have different ideas about how to approach supporting it as well.

biobambam (https://github.com/gt1/biobambam) uses the MC SAM flag to store the coordinate of the mate in their duplicate marking. If using biobambam after samblaster (for merging/sorting/de-duplicating split BAMs) this overwrites samblaster's flag. This commit avoids failing in this case since the original code assumed a string, and makes use of this if present.

cc2qe commented 9 years ago

Great, thanks for the fix. I merged it for compatibility with biobambam, although MC is an explicitly defined tag in the SAM spec so other software may also use it for mate CIGAR strings.

chapmanb commented 9 years ago

Colby -- thanks much for merging this and for the heads up on the MC tag in the spec. This fix is a big help to get it running with our current biobambam implementation. I dug into this to prepare a report for @gt1 and realized that biobambam2 uses a different set of tags to avoid this clash and provides tools to convert over (https://github.com/gt1/biobambam2/blob/master/src/programs/bamtagconversion.1). German, do you recommend using biobambam2 now instead of biobambam?

gt1 commented 9 years ago

I have stopped working on biobambam, so improvements will only ever show up in biobambam2. If you need any of these, then I suggest to use biobambam2, as I will not backport any code.

chapmanb commented 9 years ago

German -- thank you. I moved things over to the latest biobambam2 to resolve this issue. Much appreciated.