schatzlab / genomescope

Fast genome analysis from unassembled short reads
Apache License 2.0
258 stars 57 forks source link

21mer estimated genome size is the twice as the 17mer #36

Open baozg opened 4 years ago

baozg commented 4 years ago

Hi,

I use the jellyfish (47 Gb Illumina PE150) to count different mer, but I see the 21mer result is 874 Mb, the 17mer result is 437Mb. It is very strange that three peaks in the 17mer plot, but the first peaks sees fake, due to the second and third peaks are 1:2. Do you have suggestions for my speceis? How can I get a normal result.

image image

mschatz commented 4 years ago

Hi,

Yes, sometimes the model fitting can get confused. On the genomescope2 website we now have a parameter "Average k-mer coverage for polyploid genome" that can be used to give a hint to the model fitting routine so it can find the correct peak. I would try setting it to around 20 so that it finds the first peak at 20 and the second peak at around 40.

Good luck

Mike

On Fri, May 29, 2020 at 9:49 PM Zhigui Bao notifications@github.com wrote:

Hi,

I use the jellyfish (47 Gb Illumina PE150) to count different mer, but I see the 21mer result is 874 Mb, the 17mer result is 437Mb. It is very strange that three peaks in the 17mer plot, but the first peaks sees fake, due to the second and third peaks are 1:2. Do you have suggestions for my speceis? How can I get a normal result. [image: image.png]

https://user-images.githubusercontent.com/20680150/83316702-ebcb9c00-a259-11ea-8a86-22670ea475d4.png [image: image.png]

https://user-images.githubusercontent.com/20680150/83316783-9217a180-a25a-11ea-8f65-ed7e50be3b5f.png

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/schatzlab/genomescope/issues/36, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABP34ZIDEADQH452U7A5JTRUBQ33ANCNFSM4NOPJLLA . [image: image.gif]

baozg commented 4 years ago

Hi, @mschatz

But we use wtdbg2 to assembly this genome, its size are 540M, BUSCO 99% after polishing. We believe the 17mer are more close this species' real genome size. If set peak in 20, 837 Mb is not likely true. It is very strange for the first peak in 20 since it were a diploid insect. Do you have any suggestion for this species? Or should I select the depth=20 kmer to a blast?

mschatz commented 4 years ago

Hi,

It is not unusual for a de novo assembly to be smaller than the true genome size since high copy repeats will often be excluded or mis-represented in an assembly. Neither of these fits is very good - is it possible that there is some contamination? Perhaps it was an endosymbiont like this: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1088942/

Good luck

Mike

On Sun, May 31, 2020 at 8:50 PM Zhigui Bao notifications@github.com wrote:

Hi, @mschatz https://github.com/mschatz

But we use wtdbg2 to assembly this genome, its size are 540M, BUSCO 99% after polishing. We believe the 17mer are more close this species' real genome size. If set peak in 20, 837 Mb is not likely true. It is very strange for the first peak in 20 since it were a diploid insect. Do you have any suggestion for this species? Or should I select the depth=20 kmer to a blast?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/schatzlab/genomescope/issues/36#issuecomment-636559116, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABP347CQ2ALAH7IKG3YF63RUL3N5ANCNFSM4NOPJLLA .