ruanjue / wtdbg2

Redbean: A fuzzy Bruijn graph approach to long noisy reads assembly
GNU General Public License v3.0
513 stars 94 forks source link

Suitable for very large genomes ? 25Gb (100X coverage) #111

Closed luca-Wang closed 5 years ago

luca-Wang commented 5 years ago

Hi,

Does wtdbg2 Suitable to assembly a 25Gb genomes (100X coverage) ? Are there any cpu and RAM recommendations for running this very large genome ?

Thanks!

ruanjue commented 5 years ago

wtdbg2 is suitable for 25Gb and even larger genomes. For this kind of huge genomes, please use -x rs to save CPU.time and RAM. You might need 1.5Tb RAM and run two days on 100 CPU cores.

luca-Wang commented 5 years ago

Thanks a lot ! But we have 2.5Tb PacBio Sequel Subreads, I'm a little confused ,I think the paramenter is : -x sq -g 25g -i subreads.fa.gz -t96 ?

ruanjue commented 5 years ago

Even you have Sequel reads, -p 21 -AS 4 is necessary. To get better results, -R will help, but take longer time. I suggest to first try -p 21 -AS 4 -R -g 25g --minimal-output. I am not sure how long will -R take, maybe one week. You might need to append --dump-seqs <prefix>.kbmseq for fast loading when not satisfied with the first run.