yoshihikosuzuki / ant-asm-workflow

Genome assembly workflow with HiFi + Omni-C
1 stars 1 forks source link

hifiasm peak_hom/peak_het: 5 estimates in hifiasm.log? #2

Closed AlesBucek closed 2 years ago

AlesBucek commented 2 years ago

Hi Yoshi, I noticed that slightly different k-mer coverage histograms are plotted 5-times in the hifiasm.log: what is the significance? I didn't find explanation in the hifiasm documentation but it also does not seem to be something that you specifically set via a parameter in run_hifiasm.sh. Thanks in advance for any hints! Ales

sampleID=KT1060-2_UL
grep "peak_hom" $sampleID/01-asm/hifiasm/hifiasm.log

[M::ha_ft_gen] peak_hom: 146; peak_het: 14 [M::ha_pt_gen] peak_hom: 13; peak_het: -1 [M::ha_pt_gen] peak_hom: 14; peak_het: -1 [M::ha_pt_gen] peak_hom: 14; peak_het: -1 [M::ha_pt_gen] peak_hom: 14; peak_het: -1

yoshihikosuzuki commented 2 years ago

Hi Ales, This is actually expected behavior. Hifiasm does k-mer counting and peak coverage estimation multiple times during its initial error correction stage, although in a normal case the two values remain almost the same throughout it. Here it seems like the heterozygous peak coverage estimation is wrong from the beginning (it should be ~70, I remember).

AlesBucek commented 2 years ago

Hi Yoshi, thanks for the comment! I've seen "incorrect" estimates of homozygous and heterozygous coverage by hifiasm also for different assemblies but only sometimes it leads to incorrect thresholds for the internal purge_dups of hifiasm: E.g.: sample "KT1060-2+2UL": 200/-1,22/-1,21/-1,22/-1,22/-1 and "purge duplication coverage threshold: 250". The "true" homozygous coverage is ~200. but sample "KT1060-2_UL": 146/14,13/-1,14/-1,14/-1,14/-1 and "purge duplication coverage threshold: 92". The "true" homozygous coverage is ~148.

So I'm guessing that hifiasm is estimating the coverage thresholds for purging internally and failure to calculate the heterozygous or homozygous coverage peaks only sometimes leads to wrong purging. I did not find option how to set the thresholds directly (such as in purge_dups) but defining the hom-cov seems to work well. Cheers, Ales

yoshihikosuzuki commented 2 years ago

I see, I didn't care about hifiasm's internal purging thresholds, but what you mentioned sounds reasonable. It looks hifiasm automatically determines the thresholds based on the hom/het coverage estimates.