Closed Adamtaranto closed 2 years ago
Hi Adam,
Thanks for your note. We looked into this but are a little confused with what we see. We agree with you that there does appear to be a local maximum at coverage 8, but then the next clear peak is at around 68. If the baseline coverage really is 8, that would imply most of the genome is organized into ~8 copy nearly perfect repeats. We have never seen anything like that before. I wonder if it could be something more straightforward -- the genomescope model reports haploid genome size, but I wonder if your estimates are of the diploid genome size. For example in human, genomescope would report the haploid genome size at about 3Gb, but the diploid genome size is 6Gb. Could this explain it? Then I would expect the peak at 8x is an artifact of sequencing error and the trimming you have used.
Good luck!
Mike
On Wed, May 2, 2018 at 6:00 PM Adam Taranto notifications@github.com wrote:
I'm working with a very repeat-rich fungal genome, with expected haploid genome size around 70 - 80 Mb. Genoscope seems to be locking onto the second (repeat-associated) peak and is producing a much lower than expected genome size estimate.
Here are the kmer coverage counts from the jellyfish histo file, trimmed at 4 and with the manually identified peak at 8. A second peak occurs between counts ~30-110. Jellyfish_K25.pdf https://github.com/schatzlab/genomescope/files/1968989/Jellyfish_K25.pdf
Genoscope is lumping the primary peak in with the low coverge "error" segment. [image: k25_genomescopea] https://user-images.githubusercontent.com/2160099/39550585-a01f5646-4e16-11e8-8d9f-f48c5a54b5e6.png [image: k25_genomescopeb] https://user-images.githubusercontent.com/2160099/39550586-a0372bf4-4e16-11e8-8a30-f8528180dab6.png
Histo file: 25mer_out.txt https://github.com/schatzlab/genomescope/files/1968988/25mer_out.txt
Would it be possible to force Genoscope to fit a model around a user-specified peak?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/schatzlab/genomescope/issues/13, or mute the thread https://github.com/notifications/unsubscribe-auth/AAL983ITpiAHnG8muC3E_yE59h88s1Wsks5tuixlgaJpZM4TwOaN .
Hi Mike,
I've managed to get something more sensible by lowering the TYPICAL_ERROR to "2" and START_SHIFT to "1". This seems to prevent the peak at 8x from being ignored when looking for the homozygous peak. I confess that this is supposed to be a haploid species, though I was hoping the genoscope estimate would be ~ n/2. The new unique length is ~38.4 Mb which is indeed approx half the expected genome size.
Do you have any suggestions re modifying the nls() call to fit just two negative binomial distributions?(i.e. single copy peak , and repeat peak).
Thanks for writing a really great tool!
That explains it -- If you are certain the genome is haploid, I would would just double the estimate from the default genomescope run that estimated the genome length to be 31Mbp as the current model assumes the genome is diploid.
Good luck!
Mike
On Fri, May 4, 2018 at 4:56 PM Adam Taranto notifications@github.com wrote:
Hi Mike,
I've managed to get something more sensible by lowering the TYPICAL_ERROR to "2" and START_SHIFT to "1". This seems to prevent the peak at 8x from being ignored when looking for the homozygous peak. I confess that this is supposed to be a haploid species, though I was hoping the genoscope estimate would be ~ n/2. The new unique length is ~38.4 Mb which is indeed approx half the expected genome size.
[image: plot] https://user-images.githubusercontent.com/2160099/39651232-f1a13dec-4f9f-11e8-9897-55d36c79f989.png
Do you have any suggestions re modifying the nls() call to fit just two negative binomial distributions?(i.e. single copy peak , and repeat peak).
Thanks for writing a really great tool!
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/schatzlab/genomescope/issues/13#issuecomment-386732360, or mute the thread https://github.com/notifications/unsubscribe-auth/AAL987CTG29qohKmjlPtJJNJaa1IpkJGks5tvMCZgaJpZM4TwOaN .
Hi Mike, can you tell me what does means the purple lines in my genome_scope graph?
It is supposed to highlight error kmers, meaning the kmers that are not explained by the model fit.
We actually have a new version of genomescope that generally does a better job with the fit. It works pretty much the same as before, although it now has support for higher ploidy values (in addition to diploid samples). Can you give it a try: http://genomescope.org/genomescope2.0/
Good luck
Mike
On Tue, Sep 17, 2019 at 5:51 AM unalibun notifications@github.com wrote:
Hi Mike, can you tell me what does means the purple lines in my genome_scope graph? [image: plot log] https://user-images.githubusercontent.com/51210427/65031388-77f29900-d941-11e9-98a0-b9cccb30e1b1.png
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/schatzlab/genomescope/issues/13?email_source=notifications&email_token=AABP34Y57LQKNXRFPQXF6FLQKCSDTA5CNFSM4E6A42G2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD637HUI#issuecomment-532149201, or mute the thread https://github.com/notifications/unsubscribe-auth/AABP346ET42LCAZQHUIL2NLQKCSDTANCNFSM4E6A42GQ .
Thank you so much, I will work on it!
Hello Sir,
I hope you are doing well.
I am trying to use genome scope in order to know the repeat content of the genome. I am having a hard time understanding the output. Can you please please help me out with this. The results I obtained are attached below.
I ran the following command in order to get the results.
wget https://github.com/gmarcais/Jellyfish/releases/download/v2.3.0/jellyfish-2.3.0.tar.gz
tar -xvzf jellyfish-2.3.0.tar.gz
wget https://github.com/gmarcais/Jellyfish/releases/download/v2.3.0/jellyfish-linux
cd jellyfish-2.3.0/
./bin/jellyfish count - C -m 21 -s 1000000000 -t 10 *.sra.fasta -o reads.jf
6.make histo file and dragged it to the site.
./jellyfish-2.3.0/bin/jellyfish histo -t 10 reads.jf > reads.histo
Your suggestion will really helpful.
I'm working with a very repeat-rich fungal genome, with expected haploid genome size around 70 - 80 Mb. Genoscope seems to be locking onto the second (repeat-associated) peak and is producing a much lower than expected genome size estimate.
Here are the kmer coverage counts from the jellyfish histo file, trimmed at 4 and with the manually identified peak at 8. A second peak occurs between counts ~30-110. Jellyfish_K25.pdf
Genoscope is lumping the primary peak in with the low coverge "error" segment.
Histo file: 25mer_out.txt
Would it be possible to force Genoscope to fit a model around a user-specified peak?