schatzlab / genomescope

Fast genome analysis from unassembled short reads
Apache License 2.0
250 stars 56 forks source link

potentially poor fitting model for proposed tetraploid #113

Open sjfleck opened 11 months ago

sjfleck commented 11 months ago

Hello, thank you for GenomeScope2. It's been a useful tool along with SmudgePlot. I have one GenomeScope profile that I'm not sure if the model is accurate. The 1n hump isn't there in the observed data, but it is in the full model and unique sequences. There is a clear 2n and 4n hump and this was a proposed tetraploid in SmudgePlot. I will share the plots below:

Ua_GS2_SP

I'm checking in on this because this species have 53% duplicated BUSCOs and I was planning on running Purge Haplotigs on it to reduce to a haploid assembly (as long as it's not already one). If there is a heterozygous peak, it should be ~72, but I'm not seeing one to designate for Purge Haplotigs to work on. Any insights into this or recommendations would be greatly appreciated. Thank you.

mschatz commented 10 months ago

I agree the plots are confusing. Im guessing everything between 20x to 100x represents some level of heterozygosity while the major peak at 150 represents homozygous kmers. I agree it would help to run PurgeHaplotigs, and would start with values around -l 100 -m 200 -h 500, but you should try several values to see how it impacts the BUSCO score. Fortunately, PurgeHaplotigs should only take a few minutes to run.

Good luck! Mike

On Mon, Oct 30, 2023 at 2:04 PM sjfleck @.***> wrote:

Hello, thank you for GenomeScope2. It's been a useful tool along with SmudgePlot. I have one GenomeScope profile that I'm not sure if the model is accurate. The 1n hump isn't there in the observed data, but it is . There is a clear 2n and 4n hump and this was a proposed tetraploid in SmudgePlot. I will share the plots below:

[image: Ua_GS2_SP] https://user-images.githubusercontent.com/53409202/279149128-e3b40098-4a4f-4fb1-8554-4118668d3574.png

I'm checking in on this because this species have 53% duplicated BUSCOs and I was planning on running Purge Haplotigs on it to reduce to a haploid assembly (as long as it's not already one). If there is a heterozygous peak, it should be ~72, but I'm not seeing one to designate for Purge Haplotigs to work on. Any insights into this or recommendations would be greatly appreciated. Thank you.

— Reply to this email directly, view it on GitHub https://github.com/schatzlab/genomescope/issues/113, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABP345V65TTY3D5ZM2ANDDYB7TZRAVCNFSM6AAAAAA6WOBWTGVHI2DSMVQWIX3LMV43ASLTON2WKOZRHE3DQOJUGEZTKNI . You are receiving this because you are subscribed to this thread.Message ID: @.***>