ruanjue / wtdbg2

Redbean: A fuzzy Bruijn graph approach to long noisy reads assembly
GNU General Public License v3.0
513 stars 94 forks source link

adjust parameters to improve assemblies. #269

Open dgs108 opened 7 months ago

dgs108 commented 7 months ago

I have ~21X PacBio CCS reads and have produced multiple assemblies using hifiasm, flye, and wtdbg2. The number of contigs in each assembly range from 4,965 (hifiasm) to 20,315 (flye), but wtdbg2 produces the best overall assembly (5,004 contigs; N50: 2,788,165; largest contig: 14,599,089). However, I would like to improve this assembly and would appreciate advice on parameters.

The species is a hammerhead shark with genome size ~2.7 Gbp; sharks have very repetitive genomes. I have used wtdbg2 presets 1, 3, and 4, and preset 4 produced the assembly with the fewest contigs. Adding -L 5000 to -x ccs marginally reduced the number of contigs. My code is below.

wtdbg2 -x ccs -g 2.7g -i hifi_wo_adapters_mtdna.fastq.gz -t 20 -o wtdbg_02.21.24 -f -L5000

wtpoa-cns -i wtdbg_02.21.24.ctg.lay.gz -t 20 -o wtdbg_02.21.24.ctg.fa -f

Please advise on parameters to tweak and if i should polish between the steps above.

Thanks!