ncbi / magicblast

34 stars 16 forks source link

Meaningful MAPQ values would be excellent. #39

Open JohnUrban opened 2 years ago

JohnUrban commented 2 years ago

Hi all,

I am trying out Magic-BLAST for mapping short-reads, and may want to test it in various pipelines.

I noticed, however, that the MAPQ field in the SAM output is either 0 or 255.

At first I thought that might be a classification of "multiread" vs "unique", however I then noticed it was just "unmapped" vs "mapped".

Are there plans to add more meaningful MAPQ values in the future? By "meaningful" I mean any of the following:

In terms of a feature request, I think more meaningful MAPQ values would make Magic-BLAST more straightforward to adapt into my own pipelines, thinking, and approaches. Perhaps for others as well.

Let me know if I am missing something.

What exactly was the intended use case for Magic-BLAST? Was it meant to potentially replace older progams in pipelines (e.g. bowtie2, BWA, etc)?

Best,

John

boratyng commented 2 years ago

Hi @JohnUrban,

Thank you for trying Magic-BLAST. Your observation is correct that Magic-BLAST does not report MAPQ values. We report 255 for aligned reads and zero for unaligned ones. In SAM specification 255 means that MAPQ value is unavailable. The main reason for not reporting MAPQ is that Magic-BLAST does not use base quality scores to select between read alignments.

Currently we have no plans for including MAPQ scores in SAM report, because so far none asked for them.

In terms of a feature request, I think more meaningful MAPQ values would make Magic-BLAST more straightforward to adapt into my own pipelines, thinking, and approaches. Perhaps for others as well.

Would it still be useful to you if we added MAPQ scores based only number of alignments found, without considering base quality scores?

What exactly was the intended use case for Magic-BLAST? Was it meant to potentially replace older progams in pipelines (e.g. bowtie2, BWA, etc)?

The intended use case is read mapping, just like bowtie2 or BWA, or newer tools like hisat2 or minimap2. It can be used in pipelines.

Thanks, Greg

JohnUrban commented 2 years ago

Hi Greg,

Ultimately, the direct answer to your question, "Would it still be useful to you if we added MAPQ scores based only number of alignments found, without considering base quality scores?", is: Yes, absolutely this would help.

Mapping quality scores are usually based on the number of alignments found, not necessarily base qualities (although base qualities can be used to modify bonuses or penalties when scoring an alignment). Rather than saying a read "uniquely mapped" or it "multi mapped", MAPQ gives a measure of uniqueness. How unique was the mapping? Or how uncertain was the mapping? If the read maps to only one spot in the given alignment scoring scheme with a perfect alignment score, then that would have the highest MAPQ (e.g. 40 or 42 for bowtie2, 60 for bwa mem). If it maps to only one spot with an imperfect score, the MAPQ lowers accordingly. If it maps to only two spots, but with the exact same alignment score, then there is the probability it was mapped wrong = 0.5, and MAPQ = -10*log10(0.5) = 3 (round to nearest int). If it it maps to only 3 sites, but all equally well, MAPQ=2. If 4 to 7 sites, MAPQ = 1. When > 7 sites, MAPQ=0. Something like that.

Often the read will map to two or more sites with different alignment scores. I've written fairly extensively about how Bowtie2 computes MAPQ in all these situations. Check out those blogs if you want:

If you read some of those blogs and look at the bowtie2 code, you can see that MAPQ assignments are somewhat rule-based. I'm not sure how closely they correlate with the theoretical formulation of -10*log10(mapped wrong), but it is a system I understand. I've been wanting to delve into the BWA code to see how its computed there. I've been told that BWA follows the theoretical idea more closely, but cannot confirm that.

Best,

John

JohnUrban commented 2 years ago

p.s. Here is my python implementation of the bowtie2 MAPQ calculator. https://github.com/JohnUrban/MAPQcalculators

boratyng commented 2 years ago

John,

Thank you for all the information and links. We are going through them.

Greg