schatzlab / genomescope

Fast genome analysis from unassembled short reads
Apache License 2.0
254 stars 57 forks source link

Genomescope failed to converge with -C option in Jellyfish #20

Open TypicalSEE opened 5 years ago

TypicalSEE commented 5 years ago

Hi, I'm using Jellyfish+Genomescope on pair-end NGS data. It seems like that Genomescope complains about failing to converge and the result seems not correct. I have tried changing kmer from 21 to 19, but it didn't work. I'm pretty sure the sequencing data is enough. When I turned off the -C option in Jellyfish, Genomescope seems to work good. I wonder what the reason is. So can anyone help me? The histo file is attached here. thanks!

Here are the commands I used:

jellyfish count -C -m 19 -t 16 -s 16G <(zcat CL100122025_L01_read_1.fq.gz) <(zcat CL100122025_L01_read_2.fq.gz) -o reads.jf jellyfish histo -t 16 reads.jf > reads.histo Rscript genomescope.R reads.histo 19 150 result reads.histo.txt

malonge commented 5 years ago

What is your expected genome size and how much coverage do you think you have roughly? I wonder if there is too much coverage.

malonge commented 5 years ago

Can you also provide the histo file produced when not using -C.

luo9595 commented 2 weeks ago

I want to ask if the predicted genome size is normal after you turn off the -C option, and the predicted genome size after I turn off the -C option is twice the actual assembly size.But I can't draw a picture with the -C option on.

mschatz commented 2 weeks ago

It is important to use the -C flag or you will reduce the coverage in half since kmers from the forward and reverse strands will be counted separately. If it is not working with the -C flag, you might need to adjust some of the other settings. Can you share the genomescope link (or the histo file)

Good luck

Mike

On Tue, Sep 24, 2024 at 10:38 PM luo9595 @.***> wrote:

I want to ask if the predicted genome size is normal after you turn off the -C option, and the predicted genome size after I turn off the -C option is twice the actual assembly size.But I can't draw a picture with the -C option on.

— Reply to this email directly, view it on GitHub https://github.com/schatzlab/genomescope/issues/20#issuecomment-2372771950, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABP3426SJYQTKNCNLDOA2TZYIO4BAVCNFSM6AAAAABOZQWZ7WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZSG43TCOJVGA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

luo9595 commented 2 weeks ago

@mschatz Here is my histo file 19_kmer.histo.txt I used the Next-generation sequencing quality control data to evaluate the genome size, and the selected kmer was 19. The genome size is about 2.6M, and the estimated genome size is 5.2M, which is the parameter I used. image

mschatz commented 1 week ago

If the genome is expected to be only 2.6Mbp in size, Im guessing this is a bacteria. Im very surprised by the kmer distribution (in blue) that has two peaks at ~150x and ~300x. This is usually the signature for a heterozygous sample - perhaps this is not a clonal population? The variations in the population will increase the genome size estimate. Have you tried to run the assembler? I would normally recommend spades for a short read bacterial genome assembly.

I would also comment that you probably have too much coverage for the assembler. I would recommend a random downsampling to keep only 33% of the reads - after this your peak at 150x will shift to 50x and 300x will shift to 100x. This will often really improve the assembly

Here is the link to the profile with default parameters: http://genomescope.org/genomescope2/analysis.php?code=ubcE9uhKsJjDwDLeHN25

Good luck

Mike

On Thu, Sep 26, 2024 at 2:38 AM luo9595 @.***> wrote:

@mschatz https://github.com/mschatz Here is my histo file 19_kmer.histo.txt https://github.com/user-attachments/files/17143946/19_kmer.histo.txt I used the Next-generation sequencing quality control data to evaluate the genome size, and the selected kmer was 19. The genome size is about 2.6M, and the estimated genome size is 5.2M, which is the parameter I used. image.png (view on web) https://github.com/user-attachments/assets/0896a6bb-7567-45ac-91f5-f07674e4eaba

— Reply to this email directly, view it on GitHub https://github.com/schatzlab/genomescope/issues/20#issuecomment-2376058059, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABP347SWA5LZNGEHYJCKDLZYOTVNAVCNFSM6AAAAABOZQWZ7WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZWGA2TQMBVHE . You are receiving this because you were mentioned.Message ID: @.***>