ruanjue / wtdbg2

Redbean: A fuzzy Bruijn graph approach to long noisy reads assembly
GNU General Public License v3.0
497 stars 91 forks source link

Parameter choice for big, repetitive genomes #230

Open sighe opened 3 years ago

sighe commented 3 years ago

I appreciate your developing a fast and good assembler and following up the issues here. We are currently working on animal genomes of >4Gb with high abundance (>60%) of repetitive elements.

Here is our experience with the latest species of 4.7Gb genome with PacBio CLR reads. While referring to the issue #218 , we have tried out the parameter sets '-l 6000 -m 200' and '-l 6000 -m 600' in addition to the default. The results did not differ that much but the both runs with '-l 6000' resulted in larger total assembly size by 0.2Gb, probably as expected.

Do you have any recommendation in parameter setting for such large, repetitive genomes? '-R -s' and '--aln-dovtail -1' like you suggested in #218 ? Is there any recommended value for '-s' in particular?

ruanjue commented 3 years ago

Add -R, try --aln-dovetail -1 or --aln-dovetail 1024, also -l 6000. You can fast load alignments by --load-alignments, then increase -s and -l to build assembly graph.