Open Haoran-Xue opened 4 months ago
Hi,
I had the same problem, PacBio HiFi sequences of a diploid plant species and the plot looks like this:
Any help?
Also seeing this with one of our genomes. Of a lot of four Revio SMRT cells (all the same species) one plot looks like the above. This particular SMRT cell that has this plot has a much higher number of reads than others, but besides that nothing stands out. The other three plots looked reasonable. I'm curious to know what is causing this
The automatic model fitting algorithm can get confused if you have too high of coverage or if there is ambiguity in the relationships between the homozygous and heterozygous peaks. The easiest way to address is to use the "Average k-mer coverage for polyploid genome" parameter which gives a hint as to where the first peak (heterozygous peak) is located. For these datasets I would try with a value of about 100. If that doesnt work, the next easiest thing to do is downsample the read dataset to reduce the coverage. From a raw read file, you can just use 'head' to select the first N lines in the file to reduce the number of reads, which serves as a random downsample (assuming the reads have not been aligned or other processing has happened)
Good luck!
Mike
On Fri, Jun 21, 2024 at 1:54 PM Sam Talbot @.***> wrote:
Also seeing this with one of our genomes. Of a lot of four Revio SMRT cells (all the same species) one plot looks like the above. This particular SMRT cell that has this plot has a much higher number of reads than others, but besides that nothing stands out. The other three plots looked reasonable. I'm curious to know what is causing this
— Reply to this email directly, view it on GitHub https://github.com/schatzlab/genomescope/issues/131#issuecomment-2183189954, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABP3422FVNQNHP7PRLWCPLZIRSDXAVCNFSM6AAAAABHQJLLMOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBTGE4DSOJVGQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Hello,
I ran kmc and kmc_tools with PacBio HiFi sequences of a diploid plant species: kmc -m128 -k21 -t40 -ci1 -cs10000 xxx.hifi.fastq.gz xxx xxx_tmp kmc_tools transform xxx histogram xxx.histo
Then I submit the histo file to GenomeScope2.0 (http://genomescope.org/genomescope2.0/), with "K-mer length: 21, Ploidy: 2, Max k-mer coverage: -1, Average k-mer coverage for polyploid genome: -1".
This is the linear plot I got:
It seems that the fist peak (heterozygous peak) was identified as errors. Is there any way to avoid this?
Thank you!