hsinnan75 / MapCaller

MapCaller – An efficient and versatile approach for short-read alignment and variant detection in high-throughput sequenced genomes
MIT License
29 stars 5 forks source link

How does MapCaller handle multiple mapping? #35

Closed tseemann closed 4 years ago

tseemann commented 4 years ago

When you run bwa mem etc each mapped read in given a mapping quality (MAPQ). In bwa a MAPQ=60 means it was uniquely mapped, and MAPQ=0 means it mapped to >1 location equally well.

What does MapCaller do?

Can I only use uniquely mapping reads? In variant calling in bacteria we only want those, because multimapping reads introduce false-positive SNPs.

hsinnan75 commented 4 years ago

Don't worry about that. MapCaller only uses unique alignment for variant calling. For those reads with multiple alignments, they are used to annotate which reference regions are duplication (i.e., repetitive). We infer CNVs according to duplication regions. But I need more time to figure out how to define those regions.

tseemann commented 4 years ago

So for the UnMapped/Gap regions - they are where no reads align, not even multimap reads?

https://www.internationalgenome.org/wiki/Analysis/Variant%20Call%20Format/VCF%20(Variant%20Call%20Format)%20version%204.0/encoding-structural-variants/

##ALT=<ID=CNV,Description="Copy number variable region">
##ALT=<ID=DUP,Description="Duplication">

Have fun :-)

hsinnan75 commented 4 years ago

Hi. I forgot to check the multi-mapping when identifying unmapped/gap regions. I'll correct this error and only report gaps whose size >= 50bp.

A CNV could be a large deletion or a multiple-copy (duplication) of a region. It seems there is ambiguity between CNV and DUP.

tseemann commented 4 years ago

I'll correct this error and only report gaps whose size >= 50bp.

Can you make this a command line parameter please? When aligning reads to bacterial reference genomes we want to know every unmapped base.

hsinnan75 commented 4 years ago

Okay, I'll add one more option to set the size threshold of unmapped regions. However, some unmapped regions are due to deletion events. In such cases, they will not be reported.

hsinnan75 commented 4 years ago

Hi, the argument is "-min_gap" for specify the minimal gap size for unmapped regions. Please update to v0.9.9.18.

tseemann commented 4 years ago

But the deletion events are called as variants still?

hsinnan75 commented 4 years ago

Yes, deletions are called. An unmapped region will be reported if it is not overlapped with a deletion event.