ruanjue / wtdbg2

Redbean: A fuzzy Bruijn graph approach to long noisy reads assembly
GNU General Public License v3.0
513 stars 94 forks source link

abnormal node depth ??? #53

Closed bitcometz closed 5 years ago

bitcometz commented 5 years ago

Hello,

I used the wtdbg2 to do the assembly for a genome(~2.8Gbp) with ~30X data(PacBio, length cutoff:7000). The parameters for kbm2: -p 0 -k 15 -S 2 -m 300 the parameters for wtdbg2: --node-drop 0.25 --node-len 1024 --node-max 100 --aln-dovetail -1

and the log information: Done, 5992448 reads (>=0 bp), 87876681500 bp, 340291433 bins [Mon Dec 10 15:19:57 2018] chainning ... 1796935 hits into 896135, deleted 13977831 non-best hits between two reads [Mon Dec 10 15:20:08 2018] picking best 500 hits for each read ... 178840586 hits [Mon Dec 10 15:20:23 2018] clipping ... 14.39% bases [Mon Dec 10 15:24:48 2018] generated 859464418 regs [Mon Dec 10 15:25:00 2018] sorting regs ... Done [Mon Dec 10 15:25:32 2018] generating intervals ... 30385993 intervals [Mon Dec 10 15:25:39 2018] selecting important intervals from 30385993 intervals [Mon Dec 10 15:29:02 2018] Intervals: kept 1146431, discarded 29239562 [Mon Dec 10 15:29:12 2018] median node depth = 7 [Mon Dec 10 15:29:12 2018] masked 19859 high coverage nodes (>100 or <3) [Mon Dec 10 15:29:14 2018] masked 76516 repeat-like nodes by local subgraph analysis [Mon Dec 10 15:29:14 2018] generating edges [Mon Dec 10 15:29:26 2018] Done, 4335269 edges

[Mon Dec 10 15:30:25 2018] Estimated: TOT 1712349952, CNT 45608, AVG 37545, MAX 6525440, N50 73728, L50 2748, N90 13312, L90 26733, Min

The average node depth is around 7, which I think is abnormal and should be respond for the low N50 index. Could you give me some advice to improve my genome assembly? Thanks!

Best

ruanjue commented 5 years ago

If you can find a computer having 256GB RAM, please run wtdbg2 -x sq. Also, I don't think you will get a good assembly based on 30 X sequel data.

bitcometz commented 5 years ago

Thanks! I will have a try. Another question about the kbm2 alignment:

I have set -K 100 for the kbm2 alignment, why it automatically change it to 13322, which I think it is really high.

-- Starting program: ./kbm2 -m 300 -K 100 -n 100 -p 0 -k 15 -S 2 .... -- pid 2928 -- date Tue Dec 4 11:31:01 2018

[Tue Dec 4 11:31:01 2018] loading sequences [Tue Dec 4 11:33:54 2018] 1066450 sequences, 13972748544 bp, 54581049 bins [Tue Dec 4 11:33:54 2018] indexing, 10 threads [Tue Dec 4 11:33:54 2018] - scanning kmers (K15P0S2.00) from 54581049 bins

PROC_STAT(0) : real 542.740 sec, user 2642.220 sec, sys 41.570 sec, maxrss 9425032.0 kB, maxvsize 10264504.0 kB [Tue Dec 4 11:40:04 2018] - high frequency kmer depth is set to 13322 [Tue Dec 4 11:40:04 2018] - Total kmers = 264278618 [Tue Dec 4 11:40:04 2018] - average kmer depth = 23 [Tue Dec 4 11:40:04 2018] - 0 low frequency kmers (<1) [Tue Dec 4 11:40:04 2018] - 11048 high frequency kmers (>13322) [Tue Dec 4 11:40:04 2018] - indexing 264267570 kmers, 6161602360 instances (at most)

ruanjue commented 5 years ago

The second question is fixed. See https://github.com/ruanjue/wtdbg2/commit/5009cffdbf09093428577cf5a4db2122ccfb6532 .