jiantao / Tangram

Fast Structural Variation Detection Toolbox
MIT License
18 stars 6 forks source link

MOSAIK bam files for Tangram #7

Closed RPSeq closed 9 years ago

RPSeq commented 9 years ago

Jiantao,

I've been running Tangram on some test sequences aligned with MOSAIK using the '-sref' option (and the hg19 + moblist MEI reference). With this option, MOSAIK produces two bam files- one with reads that aligned to the special references, and one with all other alignments. For Tangram usage, should I only use the special.bam file, or should I merge the files and use all the aligned reads?

Thanks again,

Ryan

AlistairNWard commented 9 years ago

Hi Ryan,

For Tangram, you need to use the standard output BAM file, not the special.bam and you don't need to merge the files. When aligning, Mosaik also attempted to align all reads to the 'special' reference sequences. These reference sequences do not appear in the BAM header and none of the reads will be listed as having mapped to those sequences in the position field in the BAM record. However, the reads that did map to the mobile elements will have been included in the BAM file with the same coordinates as the uniquely mapped mate (as opposed to just picking a random MEI position anywhere in the genome) and the ZA tag at the end of the BAM record will have been tagged with the fact that the alignment hit one of these sequences (you can grep for L1 for example in the BAM file, and you should see that some reads have L1 in the ZA tag at the end of the BAM record). When Tangram reads this BAM file, it will look for fragment length discordancy between read pairs, and situations where one mate is uniquely mapped and the other mate maps to an MEI using the ZA tag and the fact that the two reads will appear together in the BAM file. It will also attempt to split read map reads across MEI breakpoints and collate all of these signals.

I hope this helps.

Alistair Ward

On Wed, May 13, 2015 at 1:16 PM, Ryan Smith notifications@github.com wrote:

Jiantao,

I've been running Tangram on some test sequences aligned with MOSAIK using the '-sref' option (and the hg19 + moblist MEI reference). With this option, MOSAIK produces two bam files- one with reads that aligned to the special references, and one with all other alignments. For Tangram usage, should I only use the special.bam file, or should I merge the files and use all the aligned reads?

Thanks again,

Ryan

— Reply to this email directly or view it on GitHub https://github.com/jiantao/Tangram/issues/7.

RPSeq commented 9 years ago

Thanks for the info, everything seems to be working properly with my data.