wwood / CoverM

Read coverage calculator for metagenomics
GNU General Public License v3.0
311 stars 31 forks source link

Calculations of min-read-percent-identity and min-read-aligned-length #161

Open Rridley7 opened 1 year ago

Rridley7 commented 1 year ago

Hello, I wanted to ask what methods and calculations are being used to calculate percent identity and alignment length percentage when filtering? Is this related to the NM tag, or parsing of the cigar string? Thanks!

wwood commented 1 year ago

Hi,

Sure, it is just based on NM tags plus lengths of the read etc. The cigar string cannot be reliably used for %ID because mapping software often does not distinguish between a match and a mismatch - both are encoded as 'M'. ben

Rridley7 commented 1 year ago

I see. So if I'm understanding correctly, % identity is calculated as (length of aligned region - NM) / (length of aligned region)? Is the cigar used for this length of this aligned region, the start and end reference positions, or something else?

Following this, the min-read-aligned-length is the difference of this alignment length - original read length / original read length?