Closed pbfrandsen closed 2 years ago
Hi Paul,
I agree this looks fishy to me. I think the automatic fit got confused by the very high coverage available. I got a much better fit by giving the model a hint as to where the peaks should be by setting "Average k-mer coverage for polyploid genome" to 72: http://qb.cshl.edu/genomescope/genomescope2.0/analysis.php?code=KlXwOqo0GWE33TTYKcJ8
This shows a (haploid) genome size of about 94Mbp with a 0.35% heterozygosity rate. Does this seem reasonable? I noticed your kmer histogram is cut off at 10,000x coverage so may be underestimating the genome size a bit since it will exclude very high copy repeats in satellites and centromeres. For a more robust estimate I would recommend increasing this to 100,000 or more.
Also, if you are planning to assemble these reads, Id recommend you randomly downsample so that the main peak (mode of the distribution) is around ~50x to 100x. Beyond this range, assemblers tend to get confused and can give a poorer assembly.
Good luck
Mike
On Wed, Jun 29, 2022 at 12:22 PM Paul Frandsen @.***> wrote:
Dear Genomescope developers, thank you for the great tool. We use it a lot and it's great! I had an interesting result recently, which resulted in a smaller genome size estimate than we would expect (nearly half the size). I noticed that the error curve was fit to one of the smaller peaks. I wondered if this could be the reason for the lower-than-expected size estimate. Any thoughts on whether this is the case/whether there is a parameter that we might change to avoid that?
http://genomescope.org/genomescope2.0/analysis.php?code=FsUZp8wcQEMdj4JJUvol
Many thanks,
Paul Frandsen
— Reply to this email directly, view it on GitHub https://github.com/schatzlab/genomescope/issues/79, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABP342N7T3MFNKQVCRDUADVRRZ5PANCNFSM52GLMY3Q . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Thanks, Mike! That is really close to the published estimates. Much more in line with what we expect. Many thanks for the help.
Dear Genomescope developers, thank you for the great tool. We use it a lot and it's great! I had an interesting result recently, which resulted in a smaller genome size estimate than we would expect (nearly half the size). I noticed that the error curve was fit to one of the smaller peaks. I wondered if this could be the reason for the lower-than-expected size estimate. Any thoughts on whether this is the case/whether there is a parameter that we might change to avoid that?
http://genomescope.org/genomescope2.0/analysis.php?code=FsUZp8wcQEMdj4JJUvol
Many thanks,
Paul Frandsen