vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.1k stars 193 forks source link

Read coverage of haplotype sampling #4405

Open SC-Duan opened 2 days ago

SC-Duan commented 2 days ago

Hi, When haplotype sampling based on kmer counts in the reads, the Note told that read coverage should be at least 20x to tell the difference between heterozygous and homozygous kmers reliably. The read coverage of my data is about 10x, so could I still use this method? Thank you!

jltsiren commented 2 days ago

We have never tried the method with read coverage lower than 20x.

With 10x read coverage, kmer coverage will likely be 7x or 8x. Kmers that occur 2-5 times will be classified heterozygous and those that occur 6-17 or 6-20 times homozygous. The classification will be noisy, but I can't say in advance if it will be good enough.