marbl / merqury

k-mer based assembly evaluation
Other
272 stars 19 forks source link

percentage of heterozygosity #63

Closed m-jahani closed 2 years ago

m-jahani commented 2 years ago

Hi @arangrhie, Thanks for the excellent tool.

I am working with two haplotype assemblies from a diploid genome (assembled by HIFIasm with HIFI+HiC data).

I am wondering if there is a way to estimate the heterozygosity percentage by looking at the number of shared k-mers between two haplotypes and haplotype-specific k-mers. I would appreciate your help in this matter.

Thanks, Mojtaba

arangrhie commented 2 years ago

Hi Mojtaba,

I would suggest to use Genomescope2 for that purpose, using the read kmers histogram obtainable with

meryl histogram reads.meryl > reads.hist

1) using shared k-mers between the two haplotypes - I assume are obtained from your haplotype assemblies? I wouldn't rely on the assembled sequences, as it is likely to contain errors and haplotype switches.

2) haplotype-specific k-mers - I guess you mean to get the heterozygosity = (maternal + paternal hapmers) / all read mers?

The hapmers are the inherited, distinguishable kmers from the parental genome. Any k-mer from a shared heterozygous region between the parents (e.g. AB AB, inherited as AB or BA) are not included, thus using the equation above would result in an under-estimated level the heterozygosity.

Thanks, Arang