smarco / gem3-mapper

GEM-Mapper v3
GNU General Public License v3.0
56 stars 17 forks source link

Supplementary hits #23

Open brendanofallon opened 3 years ago

brendanofallon commented 3 years ago

Howdy, thanks for your great work on GEM! I'm wondering if there's a way to get GEM to find / emit more supplementary hits with accompanying SA tag. This is important for some structural variant callers (e.g. Manta) to do their work. I see some reads get the XA tag with a similar format, but I think that's a little different.

Thanks!

smarco commented 3 years ago

Hi,

Option "--max-reported-matches|-M |'all' (default=5)" should do the trick. By default, it reports a maximum of 4 supplementary. Try first increasing this parameter; let's see it the result meets your needs.

./bin/gem-mapper --max-reported-matches 100

Let us known. Cheers,

brendanofallon commented 3 years ago

Thanks! I just tried that, but it's still not finding any supplementary hits for some of these reads, compared to BWA, at least). The reads do have a bunch of softclipped bases and relatively low mapping quality score, but no SA or XA tags.

Update: After scanning the whole BAM, I'm actually not finding any reads as being flagged as supplementary (according to samtools view -f 0x800..), maybe I'm doing something wrong? The command I'm executing is:

gem-mapper -I human_g1k_v37_decoy_phiXAdaptr.gem -M 100 -1 trimmed-pair1.fastq -2 trimmed-pair2.fastq -t 12 -r '"@RG   ID:None PL:ILLUMINA     LB:None SM:None PU:None"' | samtools sort -@ 12 -O BAM -o sorted.bam
smarco commented 3 years ago

Ok.

Then, let's try to push the mapper a little bit further:

gem-mapper [...] --mapping-mode=sensitive

Certainly, this will generate more alt-matches. Don't forget to include '--max-reported-matches 100' or similar. Otherwise, I can have a look myself until the tool floods you with alt-mappings.

Cheers,

brendanofallon commented 3 years ago

Hi again, I tried adding "--mapping-mode=sensitive" (and retaining -M 100), but I'm still not seeing any reads with the "supplementary" (0x800) flag set, or that have the SA tag, for the entire set of about 50M reads. This feels like it must be a bug (maybe mine?) - GEM should at least occasionally find reads with supplementary mappings, right?

smarco commented 3 years ago

It's weird. Nonetheless, if you can share index/reference (link to .fa file) and just 1 sequence. I can rerun and try to see what is the problem.

brendanofallon commented 3 years ago

Thanks for looking into this, I'm aligning to the human genome reference (hg19), here's an example of a read pair that BWA thinks should have a supplementary hit, but doesn't seem to get one:

@A00576:54:HCTG7DRXX:2:2260:31467:35697:CTGTA+CAACT CTTACCAAACTCTAAATTTTCTCTTGGAAACTCCCATTTGAGATCATATTCATATTCTCTGAAATAAACGTAGAAGTACTCATTATCTGAGGAGCCTCTCTTGCCAAACTCTAAATTTTCTCTTGGAAACTCCCATTTGATATCAT + ADC@BBACBBE:EACBADCBBDBDD+CBCCB?CCCB:.DEBDBBCBBABD9@BA@DCEC1DBCC9CCB;,A+BC+BABE,BBDAB,??+6BD:CACEBED9-,@87A1,=>99+B-;BD,B.*8ABB6B,8+A?C;9+++:,8?

..and its pair: @A00576:54:HCTG7DRXX:2:2260:31467:35697:CTGTA+CAACT GAAGTACTCATTATCTGAGGAGCCGGTCACCTGTACCATCTGTAGCTGGCTTTCATACCTAAATTGCTTCAGAGATGAAATGATGAGTCAGTTAGGAATAGGCAGTTCTGCAGATAGAGGAAAGAATAATGAATTTTTACCTTTGC + ,=E++BECFC.B@BECDEDCFDD;CCCFCD;BBBC:FBCDCCBEEEC;DECDC:CBC+EBDE,DCE,C,EFDFDCCD>ECCCBBCECBFFB<BFDCDBBEDDFFCDCDCDEECBAECECCDDEDDB@DBACDBCBBBAACCCBACA

...not sure if this particular read pair will illuminate the issue (maybe GEM is missing this one for legitimate reasons), but I can supply a bunch more if needed. Thanks again!

smarco commented 3 years ago

Hi,

For the given sequence, they both seem to report the same results. The only difference is that BWA-MEM uses the SA tag and GEM the XA tag.

BWA-MEM => Primary chr13:28608214:96M50S and suplementary SA:Z:chr13,28608218,+,103S43M,60,1; Primary chr13:28608286:146M

GEM => Primary chr13:28608214:96M50S and extra-alignment XA:Z:chr13,+28608213,98S48M,100; Primary chr13:28608286:146M

brendanofallon commented 3 years ago

Yes, you're totally right, maybe that wasn't the greatest example. Below I pasted a read pair for which the XA tag isn't present at all. But the important point isn't that GEM might disagree sometimes with BWA (maybe GEM is more accurate and BWA is wrong!), it's that GEM never produces a read mapping with the SA tag or the supplementary flag set - not a single one in many of the alignments I've looked at. Is it expected that GEM doesn't produce supplementary hits at all?

Here's a read pair that doesn't have an XA tag (or SA, or supplementary bit set), but which gets a supplementary hit in BWA.

@A00576:54:HCTG7DRXX:1:2275:18656:16705:GTAGT+CGTGT 
ACTCCCATTTGAGATCATATTCATATTCTCTGAAATCAACGTAGAAGTACTCATTATCTGAGGAGCCTCTCTTGCCAAACTCTAAATTTTCTCTTGGAAACTCCCATTTGAGATCATATTCATATTCTCTGAAATCAACGTAGAAG
+
AADBCBABCDEBD9BBAB@@CBAAABDCECEE9CCBCBCB;BADBCDBABEABBBABCEEADCBBCCECECEDEDCBCCBEC?ACCBDDABACECE6BCBBECBB@?@CD7B?ABA@>ACBA?@ACBDBEBAB?A@ABB:AADBB*

@A00576:54:HCTG7DRXX:1:2275:18656:16705:GTAGT+CGTGT
ACTCCCATTTGAGATCATATTCATATTCTCTGAAATCAACGTAGAAGTACTCATTATCTGAGGAGCCTCTCTTGCCAAACTCTAAATTTTCTCTTGGAAACTCCCATTTGAGATCATATTCATATTCTCTGAAATCAACGTAGAAG
+
AADBCBABCDEBD9BBAB@@CBAAABDCECEE9CCBCBCB;BADBCDBABEABBBABCEEADCBBCCECECEDEDCBCCBEC?ACCBDDABACECE6BCBBECBB@?@CD7B?ABA@>ACBA?@ACBDBEBAB?A@ABB:AADBB*
smarco commented 3 years ago

Hi,

(1) These 2 reads are, in fact, the same read duplicated. Is that what you meant? (2) Both BWA-MEM and GEM report essentially the same (using different tags). BWA seems to me a little bit redundant, but ok.

BWA Primary chr13 28608218 60 74S72M and suplementary SA:Z:chr13,28608243,+,67M79S,60,0; Primary chr13 28608243 60 67M79H and suplementary SA:Z:chr13,28608218,+,74S72M,60,0;

GEM Primary chr13 28608213 0 69S77M and extra-alignment XA:Z:chr13,+28608243,67M79S,79

(3) Supplementary hits

GEM never produces a read mapping with the SA tag or the supplementary flag set

Yes. GEM outputs these supplementary as extra-alignment. Different tag/format, just that.

Let me know your thoughts.

brendanofallon commented 3 years ago

Thanks again for your help with this! So is it correct to say that GEM doesn't emit reads with the SA tag or supplementary bit set, but instead encodes the same info in the XA tag? That's totally fine, just wanted to make sure that I'm not missing some important info. I think some structural variant detection tools (e.g. Lumpy, Manta) specifically look for reads with SA tags + supplementary bit to trigger a search for SVs, so we are currently missing some SV calls when aligning with GEM compared to BWA. If it's easy to alter GEM's output to use the SA tag in place of XA it might improve out-of-the-box sensitivity a bit for these types of variants. Thanks again for your hard work on GEM - it really is much faster than BWA and yields similar sensitivity and somewhat better PPV for small variants in our data

smarco commented 3 years ago

Thanks again for your help with this! So is it correct to say that GEM doesn't emit reads with the SA tag or supplementary bit set, but instead encodes the same info in the XA tag?

Yes, precisely.

If it's easy to alter GEM's output to use the SA tag in place of XA it might improve out-of-the-box sensitivity a bit for these types of variants.

Yes, it is very easy to adapt. I tag this issue as a feature request.

Thanks for your support and nice words.