Open s-kyungyong opened 1 year ago
Thanks for writing. Genomes (or subgenomes) with very low heterozygosity are always tricky because it can be hard to interpret what the different peaks represent. Looking at the wheat data again (especially supp fig 21) I think we may have misinterpreted the peaks. Now I think the peak at ~50x comes from kmers that are homozygous in each subgenome and not heterozygous kmers. In retrospect we should have imposed a "peak" at 25x to capture heterozygous kmers (of which there are very few). This would have the effect of expanding the haploid genome size (1C) to ~4.6Gb rather than 2.34G, and imply we should multiply 4.6Gb x 3 to get the 1C genome of ~13.8GB which is close to the published assembly and other previous reports from flow cytometry.
For your data, your analysis sounds right - if the haplotype size is 2.64, then you should multiply by 4 to get 10.56Gb for AABB. If you send the link to the genomescope output I can take a look to confirm
Good luck
Mike
On Thu, Aug 24, 2023 at 2:01 PM skyungyong @.***> wrote:
Hello!
In the paper, it is mentioned that "The current best assembly spans 15.34 Gbp, while the GenomeScope estimate is 14.1 Gbp (see Supplementary Figs. 21 and 22)". Fig. S21, GenomeScope estimated a haplotype size as 2.3545G, and this gives a genome size of 14.1 Gb (2.3545 * 6) for the hexaploid genome (AABBDD). However, I thought that the reference wheat genomes only have three representative haplotypes as each subgenome (e.g., AA) is homozygous and collapsed during the assembly, and the estimated genome size for 15.3 Gb is for ABD and not for AABBDD.
I also have a allotetraploid genome with a final genome size of 10.5 Gb (AB collapsed from AABB due to very low heterozygosity). GenomeScope estimated the haplotype size as 2.64 Gb, and this gives 2.64 x 4 = 10.56 Gb for AABB. The estimate looks right, but the calculation wasn't so clear to me. Perhaps, I am misunderstanding something! Could you clarify this for me please?
Thank you so much! Kyungyong
— Reply to this email directly, view it on GitHub https://github.com/schatzlab/genomescope/issues/107, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABP343YAJCDUWLBTWKN4TLXW6JHVANCNFSM6AAAAAA35MQCJA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Hi Mike,
Thank you for checking that! The situation I have for my genome is actually the same as the case of wheat genome. My genome is highly homozygous and thus collapsed to produce AB rather than AABB. This means that each haplotype is not 2.64Gb estimated by GenomeScope but ~5.0Gb. I will attatch the genomescope outputs here.
Thank you! Kyungyong
Thanks! Your analysis seems reasonable. If you share the histogram (or link to genomescope) I can try adjusting the parameters to get this value automatically
Good luck! Mike
On Fri, Sep 8, 2023 at 8:28 PM skyungyong @.***> wrote:
Hi Mike,
Thank you for checking that! The situation I have for my genome is actually the same as the case of wheat genome. My genome is highly homozygous and thus collapsed to produce AB rather than AABB. This means that each haplotype is not 2.64Gb estimated by GenomeScope but ~5.0Gb. I will attatch the genomescope outputs here.
GenomeScope_output.zip https://github.com/schatzlab/genomescope/files/12564270/GenomeScope_output.zip
Thank you! Kyungyong
— Reply to this email directly, view it on GitHub https://github.com/schatzlab/genomescope/issues/107#issuecomment-1712358570, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABP345P5MTVEZK55V6PZT3XZOZZRANCNFSM6AAAAAA35MQCJA . You are receiving this because you commented.Message ID: @.***>
Hi Mike,
Here is the histrogram! These were the commandlines I used: reads.histo.zip
jellyfish count -C -m 21 -s 50000000000 -t 20 Kronos.HiFi.fastq -o kmer_counts.jf
jellyfish histo -h 5000000 -t 20 kmer_counts.jf > reads.histo
genomescope2 -p 4 -i reads.histo -o genomescope --verbose
Thank you! Kyungyong
Hello!
In the paper, it is mentioned that "The current best assembly spans 15.34 Gbp, while the GenomeScope estimate is 14.1 Gbp (see Supplementary Figs. 21 and 22)". Fig. S21, GenomeScope estimated a haplotype size as 2.3545G, and this gives a genome size of 14.1 Gb (2.3545 * 6) for the hexaploid genome (AABBDD). However, I thought that the reference wheat genomes only have three representative haplotypes as each subgenome (e.g., AA) is homozygous and collapsed during the assembly, and the estimated genome size for 15.3 Gb is for ABD and not for AABBDD.
I also have a allotetraploid genome with a final genome size of 10.5 Gb (AB collapsed from AABB due to very low heterozygosity). GenomeScope estimated the haplotype size as 2.64 Gb, and this gives 2.64 x 4 = 10.56 Gb for AABB. The estimate looks right, but the calculation wasn't so clear to me. Perhaps, I am misunderstanding something! Could you clarify this for me please?
Thank you so much! Kyungyong