Open TypicalSEE opened 5 years ago
What is your expected genome size and how much coverage do you think you have roughly? I wonder if there is too much coverage.
Can you also provide the histo file produced when not using -C
.
I want to ask if the predicted genome size is normal after you turn off the -C option, and the predicted genome size after I turn off the -C option is twice the actual assembly size.But I can't draw a picture with the -C option on.
It is important to use the -C flag or you will reduce the coverage in half since kmers from the forward and reverse strands will be counted separately. If it is not working with the -C flag, you might need to adjust some of the other settings. Can you share the genomescope link (or the histo file)
Good luck
Mike
On Tue, Sep 24, 2024 at 10:38 PM luo9595 @.***> wrote:
I want to ask if the predicted genome size is normal after you turn off the -C option, and the predicted genome size after I turn off the -C option is twice the actual assembly size.But I can't draw a picture with the -C option on.
— Reply to this email directly, view it on GitHub https://github.com/schatzlab/genomescope/issues/20#issuecomment-2372771950, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABP3426SJYQTKNCNLDOA2TZYIO4BAVCNFSM6AAAAABOZQWZ7WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZSG43TCOJVGA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
@mschatz Here is my histo file 19_kmer.histo.txt I used the Next-generation sequencing quality control data to evaluate the genome size, and the selected kmer was 19. The genome size is about 2.6M, and the estimated genome size is 5.2M, which is the parameter I used.
If the genome is expected to be only 2.6Mbp in size, Im guessing this is a bacteria. Im very surprised by the kmer distribution (in blue) that has two peaks at ~150x and ~300x. This is usually the signature for a heterozygous sample - perhaps this is not a clonal population? The variations in the population will increase the genome size estimate. Have you tried to run the assembler? I would normally recommend spades for a short read bacterial genome assembly.
I would also comment that you probably have too much coverage for the assembler. I would recommend a random downsampling to keep only 33% of the reads - after this your peak at 150x will shift to 50x and 300x will shift to 100x. This will often really improve the assembly
Here is the link to the profile with default parameters: http://genomescope.org/genomescope2/analysis.php?code=ubcE9uhKsJjDwDLeHN25
Good luck
Mike
On Thu, Sep 26, 2024 at 2:38 AM luo9595 @.***> wrote:
@mschatz https://github.com/mschatz Here is my histo file 19_kmer.histo.txt https://github.com/user-attachments/files/17143946/19_kmer.histo.txt I used the Next-generation sequencing quality control data to evaluate the genome size, and the selected kmer was 19. The genome size is about 2.6M, and the estimated genome size is 5.2M, which is the parameter I used. image.png (view on web) https://github.com/user-attachments/assets/0896a6bb-7567-45ac-91f5-f07674e4eaba
— Reply to this email directly, view it on GitHub https://github.com/schatzlab/genomescope/issues/20#issuecomment-2376058059, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABP347SWA5LZNGEHYJCKDLZYOTVNAVCNFSM6AAAAABOZQWZ7WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZWGA2TQMBVHE . You are receiving this because you were mentioned.Message ID: @.***>
Hi, I'm using Jellyfish+Genomescope on pair-end NGS data. It seems like that Genomescope complains about failing to converge and the result seems not correct. I have tried changing kmer from 21 to 19, but it didn't work. I'm pretty sure the sequencing data is enough. When I turned off the -C option in Jellyfish, Genomescope seems to work good. I wonder what the reason is. So can anyone help me? The histo file is attached here. thanks!
Here are the commands I used:
jellyfish count -C -m 19 -t 16 -s 16G <(zcat CL100122025_L01_read_1.fq.gz) <(zcat CL100122025_L01_read_2.fq.gz) -o reads.jf
jellyfish histo -t 16 reads.jf > reads.histo
Rscript genomescope.R reads.histo 19 150 result
reads.histo.txt