ruanjue / wtdbg2

Redbean: A fuzzy Bruijn graph approach to long noisy reads assembly
GNU General Public License v3.0
513 stars 94 forks source link

Parameter suggestions for a repetitive, medium-sized plant genome #148

Closed davidecarlson closed 5 years ago

davidecarlson commented 5 years ago

Hi there, Thanks for making this tool available. It's amazingly fast!

I'm currently trying to assemble a medium sized (~ 1.2 Gb) plant genome that has low heterozygosity, but is very repetitive. I have ~ 96x coverage PacBio data -- a mix of older RSII and somewhat more recent Sequel reads (all sequenced between late 2014 to late 2016).

I've currently run wtdbg2 twice, once with "-x rs" and once with "-x sq", and in both cases, between 800-900 Mb were assembled, with a very low N50 (roughly 40k each time).

I was just wondering if you have any suggestions for parameters I should tweak to hopefully get a larger and/or more contiguous assembly.

Thanks for any advice you can provide! Best, Dave

ruanjue commented 5 years ago

For the complicated plant genome, please try add -R option. It will take nearly double time to finish the contig assembly, but worth to try.

davidecarlson commented 5 years ago

Thank you! I will try adding -R and let you know how it goes. Best, Dave

davidecarlson commented 5 years ago

Just wanted to report back that I was able to get a 1.1 Gb assembly by adding the -R flag. It's very fragmented still, but I have MP and HI-C data that I can hopefully use to increase contiguity. Thanks for your help! Dave