ruanjue / wtdbg2

Redbean: A fuzzy Bruijn graph approach to long noisy reads assembly
GNU General Public License v3.0
513 stars 94 forks source link

Big total base but low contig length. #169

Closed lancer-lu closed 4 years ago

lancer-lu commented 4 years ago

Hi, I am running wtdbg2 with nanopore data to assembly a 4MB genome, my reads infomation is as list: total reads | 224690 Total bases | 109230708 Mean  length | 486.1 Max length | 42103 Min length | 42 Median  length | 328 Mean  quality | 10.4 Median  quality | 10.3 Read length N50 | 546 my wtdbg2 code: wtdbg2 -i ab.fq -fo ab -t 16 -x ont -g 4.3m wtpoa-cns -t 16 -i ab.ctg.lay.gz -fo ab.raw.fa my pipline: wtdbg2 →2 iterations of racon→medaka→2 iterations of pilon my assembly quanlity: contigs 43 Largest contig 49529 Total length 569967 GC (%) 41.71 N50 16525 N75 9369

In this case, I use minimap2 to map my fq reads to ref genome, the depth is about 12x, and the reads almost coverage the all ref genome.

In your paper wrote:

Wtdbg2 filters out a k-mer occurring once or over 1,000 times in reads, and then scans the reads again to build a hash table for the remaining k-mers and their positions in bins. Wtdbg2 retains alignments no shorter than 8 × 256 bp.

I guess the reason why my contigs are so short is because my reads are too short ,with a median  length 328 bp, so many reads are abandoned? Do you think so?
What parameters do you recommend for my data? Thank you very much!
lancer-lu commented 4 years ago
I have found some answer about the question, you aswered a similar question before:

https://github.com/ruanjue/wtdbg2/issues/82#issue-421785342 You said,

Sorry, wtdbg2 cannot assemble those short reads, for it requires read length of at least 4 bins (4*256bp). Please have a try with other assemblers.

I'm a little confused about at least 4 bins (4*256bp). If nanopore sequence reads are poor, what is the shortest length wtdbg2 can accept? Your paper wrote:

Wtdbg2 retains alignments no shorter than 8 × 256 bp.

ruanjue commented 4 years ago

wtdbg2 uses Fuzzy Bruijn graph (FBG), in which the vertex is k-bin, where k = 4. In fact, you can select other k size by --node-len, please have a look at wtdbg2 --help. Anyway, I still suggest you to choose other assemblers like newbler to assemble such shorter long reads.

lancer-lu commented 4 years ago

Thanks for your suggestion! I get it! According to the principle of the software, K should be greater than or equal to 4.