maickrau / GraphAligner

MIT License
261 stars 32 forks source link

Multiple alignments for input reads #75

Closed glennhickey closed 1 year ago

glennhickey commented 1 year ago

Hi! I'm using GraphAligner to map some hifi reads to a pangenome graph, I've run with -x vg and output to GAM and everything seems reasonable.

But, there are sometimes multiple GAM records for a single input read. In general, it seems like the first instance is the best quality mapping. Following records have lower scores, but can also appear to be tiny fragments.

My question is: how to interpret this data? Are these split mappings? If so, couldn't they be embedded into a single line? As it stands, do you have a suggestion for how to produce basic mapping statistics for this type of output?

Thanks.

maickrau commented 1 year ago

If there's multiple alignments with similar scores GraphAligner keeps them as secondary alignments. Multiple alignment lines should be ordered with the highest alignment score (best mapping) first. You can adjust the secondary alignment behavior with the parameter --multimap-score-fraction which determines when to drop secondary alignments, setting it to 1 will keep only those whose alignment score is equal to the best one. This only applies to alignments which overlap in the read. If the alignments don't overlap then they are kept regardless of score, this represents split mapping. You can also set a minimum score threshold with --min-alignment-score to discard tiny, likely false positive alignments.

For mapping statistics I'd recommend filtering the gaf by mapq and keeping the split alignments. If you only want one alignment and no split alignments then pick the one with highest alignment score (tag AS:f:). This might lead to biases for reads which have multiple equally good mappings.

glennhickey commented 1 year ago

OK, thanks for the explanation! Just to confirm, when you mention filtering my mapq, is this only doable in the GAF output? In my GAM, it does not look like any mapping_quality fields are set at all.

maickrau commented 1 year ago

Right, the mapping quality was missing from GAM. It's now added in commit 200a87b

glennhickey commented 1 year ago

That'll be handy, thanks!