Closed mikolmogorov closed 4 years ago
Hi,
I used to assemble the ONT Zymo with wtdbg2 -t 64 -i Zymo-GridION-LOG-BB-SN.fq.gz -fo dbg -x ont --node-max 1000 -e 2
, but just stop there because I am not sure how to evalutate them. I guess --node-max
will be more important in meta-assembly. It will be valuable that someone provide a script suitable for metagenomes.
Best, Jue
Thanks, I'll give it a try. For reference evaluations, I can recommend metaQUAST (http://quast.sourceforge.net/metaquast) - in my case it was very helpful.
Best, Mikhail
Thanks for the information.
Hi,
Just wanted to get back to you with my tests. These parameters indeed improved the coverage of Zymo Even dataset (total assembly size 28Mb -> 55Mb). On the other hand, it seems that it hurt the contiguity: NG50 dropped from 2.7Mb with the default parameters to 614Kb with the custom parameters (G = 25Mb for both statistics).
Best, Mikhail
Thanks. It looks that more carefully actions on assembly graph might give a better combination of assembly size and NG50.
BTW, I am trying to develop new graph clean algorithm based more on read paths, if it works better on meta-genomes, I post the results in this thread.
Best, Jue
That sounds like it might improve difficult regions like repeats too. I would be very interested in testing it. Thanks!
We've been assembling our ONT data with -K 10000 --max-node 6000 -S1
, and varying -p
, -e
and -L
for experimentation. For wtdbg2 v2.4, I've also added -X 6000 -g 62m
, which seems to get output very close to the version we used in our preprint. We evaluated the contig quality by generating dotplots of their identity to some recently published corresponding PacBio references, using scripts that can be found in our repository.
v2.4 fixed some BUGs in v2.3 and improved the ouput efficiency, I am still working on replacing the old graph clean by fully exploring read paths, will release v2.4 after finish this.
Hi,
Are there any parameter recommendations for running wtdbg2 on metagenomes? Currently with the 2.3 release, I am getting good representation of all bacteria in PacBio HMP mock assembly (https://github.com/PacificBiosciences/DevNet/wiki/Human_Microbiome_Project_MockB_Shotgun). On the other hand, ONT Zymo assembly (https://github.com/LomanLab/mockcommunity) seems to be missing a few species with coverage above the median dataset coverage. I am setting genome size to the total size of all organisms in the mixture - is that right?
Best, Mikhail