Open OrsonMM opened 1 month ago
Thanks for writing. Your kmer profile shows a bad fit between the kmer counts you have in the reads (in blue) and the GenomeScope model (in black). As such, I wouldnt trust the genome size estimate or the predicted level of heterozygosity. As you mention the major challenge here is the low coverage: focusing on the blue curve (from your reads) you have a peak at about 10x coverage which is too low for a reliable GenomeScope estimate. This will also lead to a poor assembly. My strong recommendation would be to seek out more coverage. I would also strongly recommend PacBio HiFi reads over short reads for a mammalian de novo assembly.
Good luck!
Mike
On Mon, Aug 12, 2024 at 11:08 AM Orson @.***> wrote:
Dear team Genomescope2.
I am using the tools to understand why I have a bad novo assembly, with more than 1M of contigs and small.
I have illumina sequence of Cavia Porcellus (guinea pig), genome size in GENBANK is around 3 GB. ( https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_034190915.1/)
I had this output in genomescope2:
GENOME_01 :
Rscript genomescope02.R -i $1.histo -k $2 -p 2 -n $1 -o $1_$2_dir
GenomeScope version 2.0 input file = genome_01.histo output directory = genome_01_dir p = 2 k = 21 name prefix = genome_01
property min max Homozygous (aa) 60.4219% 100% Heterozygous (ab) 0% 39.5781% Genome Haploid Length 515,546,895 bp 531,285,372 bp Genome Repeat Length 254,706,772 bp 262,482,392 bp Genome Unique Length 260,840,123 bp 268,802,980 bp Model Fit 48.8344% 74.6587% Read Error Rate 2.91981% 2.91981%
SA42913_transformed_log_plot.png (view on web) https://github.com/user-attachments/assets/e4a6e36a-530a-4e71-bd73-d8fac6cb1f98
I interpret, that Genomescope02 gives me that I have a sequencing that represents 531,285,372 bp of genomic size. In addition, the sequencing is at ~13x. And that I have a high error rate in my reads of almost 3%.
If the total genome size is 3GB in genbank, that means that having 531 Mpb, with 10x coverage, if I would have good information to get a not so bad denovo assembly.
Please, any suggestion.
— Reply to this email directly, view it on GitHub https://github.com/schatzlab/genomescope/issues/135, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABP342N7UEXWYSVPHFIFGLZRDFYBAVCNFSM6AAAAABMMMCSPOVHI2DSMVQWIX3LMV43ASLTON2WKOZSGQ3DCMRVG43DINQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Dear team Genomescope2.
I am using the tools to understand why I have a bad novo assembly, with more than 1M of contigs and small.
I have illumina sequence of Cavia Porcellus (guinea pig), genome size in GENBANK is around 3 GB. is diploid (https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_034190915.1/)
I had this output in genomescope2:
GENOME_01 :
I interpret, that Genomescope02 gives me that I have a sequencing that represents 531,285,372 bp of genomic size. In addition, the sequencing is at ~13x. And that I have a high error rate in my reads of almost 3%.
If the total genome size is 3GB in genbank, that means that having 531 Mpb, with 10x coverage, if I would have good information to get a not so bad denovo assembly.
Please, any suggestion.