Help in a weird spectra-cn.fl.png, QV and C interpretation

marbl / merqury

k-mer based assembly evaluation

Other

272 stars 19 forks source link

Help in a weird spectra-cn.fl.png, QV and C interpretation #52

Closed Unalibun closed 2 years ago

Unalibun commented 3 years ago

Hi @arangrhie,

Merqury is such a great tool, congrats!

I assembled a fungus genome assembly with Canu software using PacBio reads, polished it with Illumina datasets, and scaffolded it with Nanopore as the best strategy. The genome was haploid and presented high repetitive content of 90%. Then, I wanted to know about my quality results with Merqury. However, I obtained very weird results from which I can not give some reasonable interpretation, especially in the huge kmers proportion in reads_only, this appears that the fungus assembly could be bigger than the one that I got? (100 MB).

kindly, I would like to ask you for some help with this interpretation.

Thanks a million. I attached my a spectra-cn.fl.png plot. The QV and C results are asm_scaf 60590 93691673 44.6792 3.4047e-05 and asm_scaf all 1186554 2790136 42.5267 respectively. p_ulei_pacbio merqury p_ulei_scaf spectra-cn fl

Best wishes!

arangrhie commented 3 years ago

Hello @unalibun ,

It seems like your Illumina kmers are in low frequency. I have limited experience with fungus genomes, but I can imagine that can become a challenge than other species (easy to get contaminated or being difficult to get enough dna?).

Have you checked your total sequencing throughput? How does the spectra-cn.hist file look like, are there any 'peak's that matches your expected sequencing coverage?

The QV looks fine, it says Q44.6792 (I expect this the primary assembly?). Completeness here would be unreliable as we don't have a good way to distinguish error k-mers from solid k-mers using the peak (because there is no peak). I'd suggest not to use the completeness metric from this run.

Best, Arang

klk1409 commented 2 years ago

Hi @arangrhie,

I assembled a diatom genome (about 100Mb) with flye software using Nanopore reads, polished it with Illumina datasets. Then, I used Merqury. However, I obtained weird results from which I can not give a reasonable interpretation . Normally diatom is diploid but it's difficult to see it in the histogramms.

kindly, I would like to ask you for some help with this interpretation.

Thanks a million. I attached my spectra-cn.fl.png and spectra-asm plots.

The QV and C results are bacillariophyta 3926609 105490653 29.3975 0.00114882 and bacillariophyta all 52084439 195216558 26.6803 respectively.

merqury bacillariophyta spectra-cn fl merqury spectra-asm fl merqury spectra-asm ln merqury spectra-asm st

arangrhie commented 2 years ago

Hello @klk1409 ,

It looks to me that your sequencing read set has some contamination. Try Mash Screen to see what's in your read set. A practical runnable script is here.

arangrhie commented 2 years ago

Closing this issue, feel free to re open @klk1409.