ruanjue / wtdbg2

Redbean: A fuzzy Bruijn graph approach to long noisy reads assembly
GNU General Public License v3.0
513 stars 94 forks source link

why do contig end? #115

Closed chklopp closed 5 years ago

chklopp commented 5 years ago

Is there a way to know why the assembler has decided to end a contig? There are possibly several reasons :

Can this information be retrieved from wtdbg2 files?

ruanjue commented 5 years ago

There are three files named as <prefix>.1/2/3.dot.gz . 1 is the initial graph, 2 is after reduce transitive edges (clean to be viewed), 3 is final graph. There are two scripts: scripts/dbm_index_dot.pl and scripts/dbm_read_dot.pl. <prefix>.frg.nodes will give the node names of unitigs. In the end of <prefix>.events, you will see how contigs come from unitigs (named F\d+).

In one word, a contig -> unitigs -> nodes -> dot graph.

pgzf -fd dbg.*.dot.gz
dbm_index_dot.pl dbg.3.dot
head -1 dbg.frg.nodes # suppose the end node is 'N111'
dbm_read_dot.pl -l 20 dbg.3.dot N111 >1,dot && dot -Tpdf -O 1.dot # see 1.dot.pdf

Jue

chklopp commented 5 years ago

Thank you I succeeded drawing the contig node graph but it does not really answer my question. How do I know why the last nodes are not finding a or the (next) neighbor?

The event file is cryptic \:

F18[-:1] -> F48[+:0] = 11008, cov=1
F18[-:1] -> F48[+:0] = 11264, cov=1
F18[-:1] -> F48[+:0] = 11264, cov=1
F18[-:0] -> F48[+:0] = 24576, cov=1
F18[-:1] -> F48[+:0] = 24832, cov=1
F35[-:0] -> F55[+:0] = 11008, cov=1
F35[-:0] -> F55[+:0] = 11264, cov=1
ruanjue commented 5 years ago

Please pay attention on those lines at the end of file.

ctg0        F0          -           0
ctg1        F1          -           0
OUTPUT_CTG  ctg0 -> ctg1 nodes=4958 len=5219328
OUTPUT_CTG  ctg1 -> ctg2 nodes=101 len=148992