schatzlab / genomescope

Fast genome analysis from unassembled short reads
Apache License 2.0
254 stars 57 forks source link

distinguishing recent WGD vs polypoidy in genomescope #87

Closed mycowx closed 1 year ago

mycowx commented 1 year ago

I'm working on assembling a large mollusc genome and I'm having some trouble interpreting the genome scope results derived from a 150bp illumina library. I suspect the genome is full of repetitive elements, but genome scope shows only ~10% unique. I am wondering if this is a result of a recent WGD event or if this is possibly a ploidy issue. I am also unsure if the peaks are called correctly, as there is an uncalled shoulder on the right of the kmer distribution that may correspond the fourth peak. Any interpretations would be appreciated.

http://genomescope.org/genomescope2.0/analysis.php?code=zlf8P4pAGUdIfM31oxhe

tbenavi1 commented 1 year ago

Based on the kmer spectra, I believe this organism is tetraploid. Can you rerun genomescope with "-p 4" (ploidy 4 from the website) and then let us know if you have any further interpretation issues? Thanks.

On Mon, Nov 7, 2022 at 4:37 PM mycowx @.***> wrote:

I'm working on assembling a large mollusc genome and I'm having some trouble interpreting the genome scope results derived from a 150bp illumina library. I suspect the genome is full of repetitive elements, but genome scope shows only ~10% unique. I am wondering if this is a result of a recent WGD event or if this is possibly a ploidy issue. I am also unsure if the peaks are called correctly, as there is an uncalled shoulder on the right of the kmer distribution that may correspond the fourth peak. Any interpretations would be appreciated.

http://genomescope.org/genomescope2.0/analysis.php?code=zlf8P4pAGUdIfM31oxhe

— Reply to this email directly, view it on GitHub https://github.com/schatzlab/genomescope/issues/87, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE3K6POVOH3FFMW7AQJ62T3WHFZB5ANCNFSM6AAAAAARZUEVEU . You are receiving this because you are subscribed to this thread.Message ID: @.***>

mycowx commented 1 year ago

Thank you for your quick response. Here is the genomescope result for a tetraploid model: http://genomescope.org/genomescope2.0/analysis.php?code=mTXSBGPfcjTWfd59aF2r

I have been running a smudgeplot over the weekend and hope to have results soon that can help with interpretation. Smudgeplot is currently consuming 500GB of RAM while calculating kmer pairs!

Thanks again for your help.

tbenavi1 commented 1 year ago

This model fit looks much better. From my experience, this organism is definitely tetraploid. Each of the 4 homologs is a little over 1GB of sequence. (For humans, each of the 2 homologs is around 3GB).

On Mon, Nov 7, 2022, 5:08 PM mycowx @.***> wrote:

Thank you for your quick response. Here is the genomescope result for a tetraploid model:

http://genomescope.org/genomescope2.0/analysis.php?code=mTXSBGPfcjTWfd59aF2r

I have been running a smudgeplot over the weekend and hope to have results soon that can help with interpretation. Smudgeplot is currently consuming 500GB of RAM while calculating kmer pairs!

Thanks again for your help.

— Reply to this email directly, view it on GitHub https://github.com/schatzlab/genomescope/issues/87#issuecomment-1306288064, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE3K6PLYY5B5UREJK7JWJD3WHF4VBANCNFSM6AAAAAARZUEVEU . You are receiving this because you commented.Message ID: @.***>

mycowx commented 1 year ago

Yes, this makes sense to me! Thank you for providing this great piece of software and helping me understand the output! I'll close the issue. Cheers!