marbl / merqury

k-mer based assembly evaluation
Other
272 stars 19 forks source link

QV at +inf #38

Closed francoissabot closed 3 years ago

francoissabot commented 3 years ago

Hi Folks

I am very happy with this tool but cannot figure out the value of Quality for my assembly... I have a value of +inf

head *.qv
==> MerquryOutput.qv <==
tog7291FlyeRacon3Medaka 0   9680225839  +inf    0

The completeness is at 100%. I suspect the +inf means we reach the maximum theoreticaly possible, but can we have a value (e.g. 60 ?)

arangrhie commented 3 years ago

Hello @francoissabot,

Has meryl completed with no issues? Were the spectra-cn plots as expected? If so, congrats, I guess your assembly has no base error!

Try the following in commend line just to make sure everything is working as expected.

meryl difference asm.meryl read.meryl output asm.only.meryl
meryl statistics asm.only.meryl | head

In case asm.only.meryl has no kmers or is empty, you could say all of the kmers found in the assembly were found in the reads, indicating no obvious base level error was found.

Asking out of curiosity - what k size did you use? Seems like a huge genome, 9G of distinct kmers in the assembly? What is the genome size?

francoissabot commented 3 years ago

In fact everything was Ok, run and png plots... I run it through the different steps of polishing (raw at Q23 and racon polished at Q27) and found this +inf value after medaka (my busco score is then higher than 97% :D) The genome is not so big, it is the rice, 380Mb, with a 20pb k-mer. I selected it based on the meryl/merqury protocol And yes there are no asm only kmers :D Tnaks for your help

arangrhie commented 3 years ago

Hmm, what is your assembly size? The number above 9680225839 is the distinct number of kmers found in the given asm.meryl db. It is telling me that there are 9.6G kmers. I wonder if you gave read.meryl instead of asm.meryl.

francoissabot commented 3 years ago

The assembly is of 390Mb, as expected for this variety. I follow the complete how to twice, and obtained always this result for the 3 steps... You think there is a mistake ? the qv files value are: ==> tog7291Flye.qv <== tog7291Flye 30020657 348802641 23.4777 0.00448983 ==> tog7291FlyeRacon3.qv <== tog7291FlyeRacon3 29084810 348200133 23.6134 0.00435175 ==> tog7291FlyeRacon3Medaka.qv <== tog7291FlyeRacon3Medaka 0 9680225839 +inf 0

Effectively it says 9.6 at the last step, but I launch the analysis in bash loop for the 3 versions... I'll relaunch manually, I'll get in touch

arangrhie commented 3 years ago

Yeah there is definitely something wrong. Check your assembly .meryl db. meryl statistics asm.meryl | head will tell you what your asm.meryl has in it.

Recount the kmers from your assembly fasta file, e.g.

meryl count k=20 tog7291FlyeRacon3Medaka.fasta output asm2.meryl
meryl statistics asm2.meryl | head

and see if they are identical.

francoissabot commented 3 years ago

Ok, my fault, have relaunch everything not the good value... However, as I used a loop, cannot figure out why it worked on 2 of 3 assemblies...

arangrhie commented 3 years ago

Let me know if there is anything else I can help!