schatzlab / genomescope

Fast genome analysis from unassembled short reads
Apache License 2.0
238 stars 56 forks source link

High rate of errors, real Illumina sequencing error? #52

Open lychen83 opened 3 years ago

lychen83 commented 3 years ago

Dear all,

I have Illumina data (150 *2 bp), in total, 110 Gb. I cleaned it with Trimmomatics. I used Genomescope to estimate the genome size with kmer = 21. The estimated genome size is about 1.1G. However, the proportion of errors are high as 2.2 percent. The het is high as 4.65 percent. When I just use 60G data, Genomescope 'Failed to converge' I have used Genomescope for many species. I never found this problem before.

Why does it have a high proportion of errors?

I appreciate your help.

Best,

Chen

enh_plot

mschatz commented 3 years ago

Thanks for your interest. This is a bad fit, and is usually because the data have extensive amounts of sequencing errors or perhaps contamination present. Unfortunately there is not much that can be done to overcome situations like this other than to collect additional data

Good luck

Mike

On Sat, Feb 6, 2021 at 12:53 AM lychen83 notifications@github.com wrote:

Dear all,

I have Illumina data (150 *2 bp), in total, 110 Gb. I cleaned it with Trimmomatics. I used Genomescope to estimate the genome size with kmer =

  1. The estimated genome size is about 1.1G. However, the proportion of errors are high as 2.2 percent. The het is high as 4.65 percent. When I just use 60G data, Genomescope 'Failed to converge' I have used Genomescope for many species. I never found this problem before.

Why does it have a high proportion of errors?

I appreciate your help.

Best,

Chen

[image: enh_plot] https://user-images.githubusercontent.com/28940942/107110407-8026dd80-6882-11eb-82ee-b0dbd2d9ccf5.png

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/schatzlab/genomescope/issues/52, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABP34YWSAELZ7XYRQBOJ7DS5TKN7ANCNFSM4XF5WBTA .

lychen83 commented 3 years ago

Thank Mike,

Is it possible due to the high heterozygosity of my genome, which caused the high rate of errors?

Best,

Lingyun Chen

mschatz commented 3 years ago

Im sure that is contributing to the problem, but it seems to be more than just high heterozygosity.

Good luck

Mike

On Mon, Feb 8, 2021 at 9:49 PM lychen83 notifications@github.com wrote:

Thank Mike,

Is it possible due to the high heterozygosity of my genome, which caused the high rate of errors?

Best,

Lingyun Chen

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/schatzlab/genomescope/issues/52#issuecomment-775617620, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABP34ZZ4AG5HP2SMOWTQNTS6CPFPANCNFSM4XF5WBTA .

YingChen94 commented 9 months ago

Hi Lingyun @lychen83,

I am having a very similar plot as yours. Did you collect additional data and did more data give you a better plot?

Thanks! Ying

lychen83 commented 9 months ago

I have a species that I sequenced 200 Gb for genomescope. Howevever, it still failed. I guess it might be problem beyond the data size

Best, Lingyun

Lingyun Chen @.***

 

------------------ Original ------------------ From: "schatzlab/genomescope" @.>; Date: Fri, Sep 15, 2023 03:02 AM @.>; Cc: "Lingyun @.**@.>; Subject: Re: [schatzlab/genomescope] High rate of errors, real Illumina sequencing error? (#52)

Hi Lingyun @lychen83,

I am having a very similar plot as yours. Did you collect additional data and did more data give you a better plot?

Thanks! Ying

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

YingChen94 commented 9 months ago

Thank you Lingyun for your reply! That's scary to hear. Did your assembly work?

Thanks again! Ying