schatzlab / genomescope

Fast genome analysis from unassembled short reads
Apache License 2.0
250 stars 56 forks source link

tetraploid genome result interpretation #141

Open Liyong-Zhang opened 3 days ago

Liyong-Zhang commented 3 days ago

Hi there,

I am using GenomeScope2 to check the heterozygosity rate of a plant genome (2n=28) with HiFi reads.

The initial assembly with Hifiasm was used for running a mummerplot with A. thaliana genome as reference, this plant looks like a tetraploid. mummerplot_v6

The command for checking the heterozygous rate is genomescope.R -i reads_fasta.histo -o ./ -p 4 -k 21 -n "p4"

The results are p4_summary.txt

p4_linear_plot p4_log_plot p4_transformed_linear_plot p4_transformed_log_plot

I have trouble understanding the results. What's the overall heterozygosity rate? 7.85371%?

Also, since the two haplotypes rate (aabc and abcd) are very low aabc 0.001% abcd 0.0121%.

Could I treat this plant as a diploid when using Hifiasm for assembly given that Hifiasm doesn't fully support polyploid genome yet (https://hifiasm.readthedocs.io/en/latest/faq.html#are-polyploid-genomes-supported).

Thank you so much for your help!