ruanjue / wtdbg2

Redbean: A fuzzy Bruijn graph approach to long noisy reads assembly
GNU General Public License v3.0
513 stars 94 forks source link

time complexity #196

Closed warrenlr closed 4 years ago

warrenlr commented 4 years ago

Hello,

Thank you for your de novo long read assembly solution, which we have used successfully on a 20Gbp genome.

At the time of assembly, we had only 23x coverage, now 40X from 10kbp+ ONT reads. The low-coverage assembly took 6 days (only wtdbg2+wtpoa-cns). Strangely, you indicate processing 30X of a 32Gbp genome (mex salamander) in 2 days in your recent publication.

### OVERLAP
./wtdbg2 -x ont -g20g -t 120 -fo dbg2 input.fa
### DERIVE CONSENSUS
./wtpoa-cns -t 120 -i dbg2.ctg.lay.gz -fo dbg2.raw.fa

not sure why the discrepancy..? Coverage titration experiments indicate the time complexity of the algorithm may be quadratic.. please let me know as I'd like to assemble the 40X read fraction. Any tips to speed it up by param optimization is appreciated.

Thank you!

ruanjue commented 4 years ago

wtdbg2 assembled the axolotl genome within 2 days using -x rs. If assemble a ONT dataset using -x ont, it will take much longer. To speed it up, try to increase the kmer size with -k -p, or just use -x rs. Besides, for large genomes, please find a satisified wtdbg2 assembly, then run wtpoa-cns to generate fasta sequences, because the consensus step will even take more time than the contig assembly step.

warrenlr commented 4 years ago

thank you, I'll experiment with those Rene