Closed AlesBucek closed 2 years ago
Hi Ales, This is actually expected behavior. Hifiasm does k-mer counting and peak coverage estimation multiple times during its initial error correction stage, although in a normal case the two values remain almost the same throughout it. Here it seems like the heterozygous peak coverage estimation is wrong from the beginning (it should be ~70, I remember).
Hi Yoshi, thanks for the comment! I've seen "incorrect" estimates of homozygous and heterozygous coverage by hifiasm also for different assemblies but only sometimes it leads to incorrect thresholds for the internal purge_dups of hifiasm: E.g.: sample "KT1060-2+2UL": 200/-1,22/-1,21/-1,22/-1,22/-1 and "purge duplication coverage threshold: 250". The "true" homozygous coverage is ~200. but sample "KT1060-2_UL": 146/14,13/-1,14/-1,14/-1,14/-1 and "purge duplication coverage threshold: 92". The "true" homozygous coverage is ~148.
So I'm guessing that hifiasm is estimating the coverage thresholds for purging internally and failure to calculate the heterozygous or homozygous coverage peaks only sometimes leads to wrong purging. I did not find option how to set the thresholds directly (such as in purge_dups) but defining the hom-cov seems to work well. Cheers, Ales
I see, I didn't care about hifiasm's internal purging thresholds, but what you mentioned sounds reasonable. It looks hifiasm automatically determines the thresholds based on the hom/het coverage estimates.
Hi Yoshi, I noticed that slightly different k-mer coverage histograms are plotted 5-times in the hifiasm.log: what is the significance? I didn't find explanation in the hifiasm documentation but it also does not seem to be something that you specifically set via a parameter in run_hifiasm.sh. Thanks in advance for any hints! Ales
[M::ha_ft_gen] peak_hom: 146; peak_het: 14 [M::ha_pt_gen] peak_hom: 13; peak_het: -1 [M::ha_pt_gen] peak_hom: 14; peak_het: -1 [M::ha_pt_gen] peak_hom: 14; peak_het: -1 [M::ha_pt_gen] peak_hom: 14; peak_het: -1