schatzlab / genomescope

Fast genome analysis from unassembled short reads
Apache License 2.0
251 stars 56 forks source link

using genomescope result to determine additional sequencing needs for size estimation #29

Open jbh-cas opened 4 years ago

jbh-cas commented 4 years ago

Thanks for the GenomeScope tool, we have used it successfully for several genomes we have assembled. Now we are trying to determine the genome size of a spider. Spider genomes can be as small as 0.8G and up to 5.5G with most in the 1.5G-2.5G range.

We have total R1 + R2 reads of 98,272,048 and total bases of 14,481,220,250.

The k21 and k17 GenomeScope 2.0 runs show: Sel_R1R2_k21_reads.hist 37,359,898 bp http://qb.cshl.edu/genomescope/genomescope2.0/analysis.php?code=lF9U3QU9sf9PKDwfrPfW Sel_R1R2_k17_reads.hist 77,184,108 bp http://qb.cshl.edu/genomescope/genomescope2.0/analysis.php?code=8tXxoeraIllggYPpHfX3

The size estimation is almost certainly for a subset of the genome. Is there a way to use the GenomeScope results to estimate how much additional sequencing we might need to get a better result. Our back of the envelope guess assuming 2.5G was somewhere around 80G bases total, however that is with very little foundation.

Any insight appreciated, thanks so much

Jim Henderson California Academy of Sciences Sel_R1R2_k17_reads.histo.txt Sel_R1R2_k21_reads.histo.txt

mschatz commented 4 years ago

You have poorly resolved peaks with no separation between the errors and real genome kmers so GenomeScope is getting really confused. Are you sequencing from a single individual, that might also be contributing to the poor resolution. Unfortunately, without cleaner data is is not possible to make any predictions on genome size or other characteristics

Mike

On Wed, Nov 6, 2019 at 4:05 PM jbh-cas notifications@github.com wrote:

Thanks for the GenomeScope tool, we have used it successfully for several genomes we have assembled. Now we are trying to determine the genome size of a spider. Spider genomes can be as small as 0.8G and up to 5.5G with most in the 1.5G-2.5G range.

We have total R1 + R2 reads of 98,272,048 and total bases of 14,481,220,250.

The k21 and k17 GenomeScope 2.0 runs show: Sel_R1R2_k21_reads.hist 37,359,898 bp http://qb.cshl.edu/genomescope/genomescope2.0/analysis.php?code=lF9U3QU9sf9PKDwfrPfW Sel_R1R2_k17_reads.hist 77,184,108 bp http://qb.cshl.edu/genomescope/genomescope2.0/analysis.php?code=8tXxoeraIllggYPpHfX3

The size estimation is almost certainly for a subset of the genome. Is there a way to use the GenomeScope results to estimate how much additional sequencing we might need to get a better result. Our back of the envelope guess assuming 2.5G was somewhere around 80G bases total, however that is with very little foundation.

Any insight appreciated, thanks so much

Jim Henderson California Academy of Sciences Sel_R1R2_k17_reads.histo.txt https://github.com/schatzlab/genomescope/files/3816716/Sel_R1R2_k17_reads.histo.txt Sel_R1R2_k21_reads.histo.txt https://github.com/schatzlab/genomescope/files/3816717/Sel_R1R2_k21_reads.histo.txt

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/schatzlab/genomescope/issues/29?email_source=notifications&email_token=AABP345Y6RIW2PIXCFHYVLDQSMWRJA5CNFSM4JJ5FU6KYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HXLYXCQ, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABP343LCH6LXOIELRGMZVDQSMWRJANCNFSM4JJ5FU6A .