schatzlab / genomescope

Fast genome analysis from unassembled short reads
Apache License 2.0
249 stars 56 forks source link

Help understand the output #137

Closed gunjanpandey closed 1 week ago

gunjanpandey commented 3 weeks ago

Could you please help me understand why estimated genome size is almost half in this case? This is a fish genome, and I expect it to be around 1G.

Do I need to sequence more to increase the coverage?

linear_plot

mschatz commented 2 weeks ago

It is a not a perfect fit, but it is within acceptable limits. One thing to note is the reported genome length will be the haploid length (e.g. human will be reported as 3Gbp and not 6Gbp). But I agree this is small for a fish genome. What is your cutoff for high frequency kmers? I would raise this to 100,000 or 1,000,000 to make sure the most common repeats are accounted for

Good luck! Mike

On Fri, Aug 23, 2024 at 10:06 PM gunjanpandey @.***> wrote:

Could you please help me understand why estimated genome size is almost half in this case? This is a fish genome, and I expect it to be around 1G.

Do I need to sequence more to increase the coverage?

linear_plot.png (view on web) https://github.com/user-attachments/assets/13db0696-2b77-4e0c-8bfe-cf9540c8ba02

— Reply to this email directly, view it on GitHub https://github.com/schatzlab/genomescope/issues/137, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABP34ZLLZR2IBWLCWCSOO3ZS7TDFAVCNFSM6AAAAABNBEPSQ2VHI2DSMVQWIX3LMV43ASLTON2WKOZSGQ4DIMJRGQYDQOA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

gunjanpandey commented 1 week ago

Thanks for your help, Mike.

I ran the following command. Where should I set the cutoff.

meryl count k=19 output k19.meryl ${HiFi}

meryl histogram k19.meryl/ > k19_meryl.hist

Rscript ${genomescope} -i k19_meryl.hist -k 19 -o k19_genomescpe
mschatz commented 1 week ago

Your other plot included the high frequency kmers (out past 1e8) so meryl must be catching these already (not all kmer counters do)

See here: https://github-production-user-asset-6210df.s3.amazonaws.com/50389451/363498281-4cbdd88a-e423-4bfc-9afc-a3e391e9daf7.png?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAVCODYLSA53PQK4ZA%2F20240903%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20240903T025654Z&X-Amz-Expires=300&X-Amz-Signature=9643ba63e7acab0ef44af77873f27e40b1511cb87b4d951b87a010b896d023b7&X-Amz-SignedHeaders=host&actor_id=196083&key_id=0&repo_id=52390579

Good luck!

Mike

On Sun, Sep 1, 2024 at 10:02 AM gunjanpandey @.***> wrote:

Thanks for your help, Mike.

I ran the following command. Where should I set the cutoff.

meryl count k=19 output k19.meryl ${HiFi}

meryl histogram k19.meryl/ > k19_meryl.hist

Rscript ${genomescope} -i k19_meryl.hist -k 19 -o k19_genomescpe

— Reply to this email directly, view it on GitHub https://github.com/schatzlab/genomescope/issues/137#issuecomment-2323352886, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABP346MOQEO5YFSLQET5M3ZUMM55AVCNFSM6AAAAABNBEPSQ2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRTGM2TEOBYGY . You are receiving this because you commented.Message ID: @.***>

gunjanpandey commented 1 week ago

Thank you