ruanjue / wtdbg2

Redbean: A fuzzy Bruijn graph approach to long noisy reads assembly
GNU General Public License v3.0
513 stars 94 forks source link

Hanging when using one thread #42

Closed mrvollger closed 6 years ago

mrvollger commented 6 years ago

Hi,

I am doing a really small local assembly and noticed that the program gets stuck when I specify -t 1, and will run if I choose any other value of -t. For example

wtdbg2 -i group.0/WH.reads.fasta -f -o group.0/wtdbg.assembly/asm --ctg-min-length 10000 -L 5000 -t 1

hangs on the last line

--
-- total memory       49415848.0 kB
-- available          45630044.0 kB
-- 12 cores
-- Starting program: wtdbg2 -i group.0/WH.reads.fasta -f -o group.0/wtdbg.assembly/asm --ctg-min-length 10000 -L 5000 -t 1
-- pid                     26606
-- date         Wed Nov  7 11:23:57 2018
--
[Wed Nov  7 11:23:57 2018] loading reads
93 reads
[Wed Nov  7 11:23:57 2018] Done, 93 reads, 1674265 bp, 6492 bins
** PROC_STAT(0) **: real 0.013 sec, user 0.000 sec, sys 0.000 sec, maxrss 20372.0 kB, maxvsize 98308.0 kB
[Wed Nov  7 11:23:57 2018] generating nodes, 1 threads
[Wed Nov  7 11:23:57 2018] indexing bins[0,6492] (1661952 bp), 1 threads
[Wed Nov  7 11:23:57 2018] - scanning kmers (K0P21 subsampling 1/4) from 6492 bins
6492 bins
** PROC_STAT(0) **: real 0.113 sec, user 0.080 sec, sys 0.020 sec, maxrss 49528.0 kB, maxvsize 258436.0 kB
[Wed Nov  7 11:23:57 2018] - Total kmers = 17559
[Wed Nov  7 11:23:57 2018] - average kmer depth = 5
[Wed Nov  7 11:23:57 2018] - 163936 low frequency kmers (<2)
[Wed Nov  7 11:23:57 2018] - 0 high frequency kmers (>1000)
[Wed Nov  7 11:23:57 2018] - indexing 17559 kmers, 91195 instances (at most)
6492 bins
[Wed Nov  7 11:23:57 2018] - indexed  17559 kmers, 91015 instances
[Wed Nov  7 11:23:57 2018] - masked 635 bins as closed
[Wed Nov  7 11:23:57 2018] - sorting
** PROC_STAT(0) **: real 0.113 sec, user 0.080 sec, sys 0.020 sec, maxrss 49528.0 kB, maxvsize 258436.0 kB
[Wed Nov  7 11:23:57 2018] Done
93 reads|total hits 2348
** PROC_STAT(0) **: real 0.317 sec, user 0.270 sec, sys 0.030 sec, maxrss 51068.0 kB, maxvsize 259104.0 kB
[Wed Nov  7 11:23:58 2018] chainning ...  412 hits into 202
[Wed Nov  7 11:23:58 2018] picking best 500 hits for each read ... 2138 hits
[Wed Nov  7 11:23:58 2018] clipping ... 0.00% bases
[Wed Nov  7 11:23:58 2018] generated 34061 regs
[Wed Nov  7 11:23:58 2018] sorting regs ...  Done
[Wed Nov  7 11:23:58 2018] generating intervals ...  1471 intervals
[Wed Nov  7 11:23:58 2018] selecting important intervals from 1471 intervals
[Wed Nov  7 11:23:58 2018] Intervals: kept 29, discarded 1442
** PROC_STAT(0) **: real 0.317 sec, user 0.270 sec, sys 0.030 sec, maxrss 51068.0 kB, maxvsize 259104.0 kB
[Wed Nov  7 11:23:58 2018] Done, 29 nodes
[Wed Nov  7 11:23:58 2018] output "group.0/wtdbg.assembly/asm.1.nodes". Done.
[Wed Nov  7 11:23:58 2018] median node depth = 16
[Wed Nov  7 11:23:58 2018] masked 0 high coverage nodes (>200 or <3)
[Wed Nov  7 11:23:58 2018] masked 1 repeat-like nodes by local subgraph analysis
[Wed Nov  7 11:23:58 2018] generating edges
[Wed Nov  7 11:23:58 2018] Done, 423 edges
[Wed Nov  7 11:23:58 2018] output "group.0/wtdbg.assembly/asm.1.reads". Done.
[Wed Nov  7 11:23:58 2018] output "group.0/wtdbg.assembly/asm.1.dot". Done.
[Wed Nov  7 11:23:58 2018] graph clean
[Wed Nov  7 11:23:58 2018] rescued 0 low cov edges
[Wed Nov  7 11:23:58 2018] deleted 0 binary edges
[Wed Nov  7 11:23:58 2018] deleted 1 isolated nodes
[Wed Nov  7 11:23:58 2018] cut 21 transitive edges
[Wed Nov  7 11:23:58 2018] output "group.0/wtdbg.assembly/asm.2.dot". Done.
[Wed Nov  7 11:23:58 2018] 2 bubbles; 2 tips; 0 yarns;
[Wed Nov  7 11:23:58 2018] deleted 1 isolated nodes
[Wed Nov  7 11:23:58 2018] output "group.0/wtdbg.assembly/asm.3.dot". Done.
[Wed Nov  7 11:23:58 2018] cut 0 branching nodes
[Wed Nov  7 11:23:58 2018] deleted 0 isolated nodes
[Wed Nov  7 11:23:58 2018] building unitigs
[Wed Nov  7 11:23:58 2018] TOT 41472, CNT 1, AVG 41472, MAX 41472, N50 41472, L50 1, N90 41472, L90 1, Min 41472
[Wed Nov  7 11:23:58 2018] output "group.0/wtdbg.assembly/asm.frg.nodes". Done.
[Wed Nov  7 11:23:58 2018] generating links
[Wed Nov  7 11:23:58 2018] generated 1 links
[Wed Nov  7 11:23:58 2018] output "group.0/wtdbg.assembly/asm.frg.dot". Done.
[Wed Nov  7 11:23:58 2018] rescue 0 weak links
[Wed Nov  7 11:23:58 2018] deleted 2 binary links
[Wed Nov  7 11:23:58 2018] cut 0 transitive links
[Wed Nov  7 11:23:58 2018] remove 0 boomerangs
[Wed Nov  7 11:23:58 2018] detached 0 repeat-associated paths
[Wed Nov  7 11:23:58 2018] remove 0 weak branches
[Wed Nov  7 11:23:58 2018] cut 0 tips
[Wed Nov  7 11:23:58 2018] pop 0 bubbles
[Wed Nov  7 11:23:58 2018] cut 0 tips
[Wed Nov  7 11:23:58 2018] output "group.0/wtdbg.assembly/asm.ctg.dot". Done.
[Wed Nov  7 11:23:58 2018] building contigs
[Wed Nov  7 11:23:58 2018] searched 1 contigs
[Wed Nov  7 11:23:58 2018] Estimated: TOT 41472, CNT 1, AVG 41472, MAX 41472, N50 41472, L50 1, N90 41472, L90 1, Min 41472

However,

wtdbg2 -i group.0/WH.reads.fasta -f -o group.0/wtdbg.assembly/asm --ctg-min-length 10000 -L 5000 -t 2

completes:

--
-- total memory       49415848.0 kB
-- available          45653020.0 kB
-- 12 cores
-- Starting program: wtdbg2 -i group.0/WH.reads.fasta -f -o group.0/wtdbg.assembly/asm --ctg-min-length 10000 -L 5000 -t 2
-- pid                     29131
-- date         Wed Nov  7 11:27:23 2018
--
[Wed Nov  7 11:27:23 2018] loading reads
93 reads
[Wed Nov  7 11:27:23 2018] Done, 93 reads, 1674265 bp, 6492 bins
** PROC_STAT(0) **: real 0.020 sec, user 0.010 sec, sys 0.000 sec, maxrss 32736.0 kB, maxvsize 110772.0 kB
[Wed Nov  7 11:27:23 2018] generating nodes, 2 threads
[Wed Nov  7 11:27:23 2018] indexing bins[0,6492] (1661952 bp), 2 threads
[Wed Nov  7 11:27:23 2018] - scanning kmers (K0P21 subsampling 1/4) from 6492 bins
6492 bins
** PROC_STAT(0) **: real 0.121 sec, user 0.100 sec, sys 0.010 sec, maxrss 56000.0 kB, maxvsize 394128.0 kB
[Wed Nov  7 11:27:23 2018] - Total kmers = 17559
[Wed Nov  7 11:27:23 2018] - average kmer depth = 5
[Wed Nov  7 11:27:23 2018] - 163936 low frequency kmers (<2)
[Wed Nov  7 11:27:23 2018] - 0 high frequency kmers (>1000)
[Wed Nov  7 11:27:23 2018] - indexing 17559 kmers, 91195 instances (at most)
6492 bins
[Wed Nov  7 11:27:23 2018] - indexed  17559 kmers, 91015 instances
[Wed Nov  7 11:27:23 2018] - masked 635 bins as closed
[Wed Nov  7 11:27:23 2018] - sorting
** PROC_STAT(0) **: real 0.121 sec, user 0.100 sec, sys 0.010 sec, maxrss 56000.0 kB, maxvsize 394128.0 kB
[Wed Nov  7 11:27:23 2018] Done
93 reads|total hits 2348
** PROC_STAT(0) **: real 0.330 sec, user 0.290 sec, sys 0.020 sec, maxrss 58340.0 kB, maxvsize 394424.0 kB
[Wed Nov  7 11:27:23 2018] chainning ...  412 hits into 202
[Wed Nov  7 11:27:23 2018] picking best 500 hits for each read ... 2138 hits
[Wed Nov  7 11:27:23 2018] clipping ... 0.00% bases
[Wed Nov  7 11:27:23 2018] generated 34061 regs
[Wed Nov  7 11:27:23 2018] sorting regs ...  Done
[Wed Nov  7 11:27:23 2018] generating intervals ...  1471 intervals
[Wed Nov  7 11:27:23 2018] selecting important intervals from 1471 intervals
[Wed Nov  7 11:27:23 2018] Intervals: kept 29, discarded 1442
** PROC_STAT(0) **: real 0.330 sec, user 0.290 sec, sys 0.020 sec, maxrss 58340.0 kB, maxvsize 394424.0 kB
[Wed Nov  7 11:27:23 2018] Done, 29 nodes
[Wed Nov  7 11:27:23 2018] output "group.0/wtdbg.assembly/asm.1.nodes". Done.
[Wed Nov  7 11:27:23 2018] median node depth = 16
[Wed Nov  7 11:27:23 2018] masked 0 high coverage nodes (>200 or <3)
[Wed Nov  7 11:27:23 2018] masked 1 repeat-like nodes by local subgraph analysis
[Wed Nov  7 11:27:23 2018] generating edges
[Wed Nov  7 11:27:23 2018] Done, 423 edges
[Wed Nov  7 11:27:23 2018] output "group.0/wtdbg.assembly/asm.1.reads". Done.
[Wed Nov  7 11:27:23 2018] output "group.0/wtdbg.assembly/asm.1.dot". Done.
[Wed Nov  7 11:27:23 2018] graph clean
[Wed Nov  7 11:27:23 2018] rescued 0 low cov edges
[Wed Nov  7 11:27:23 2018] deleted 0 binary edges
[Wed Nov  7 11:27:23 2018] deleted 1 isolated nodes
[Wed Nov  7 11:27:23 2018] cut 21 transitive edges
[Wed Nov  7 11:27:23 2018] output "group.0/wtdbg.assembly/asm.2.dot". Done.
[Wed Nov  7 11:27:23 2018] 2 bubbles; 2 tips; 0 yarns;
[Wed Nov  7 11:27:23 2018] deleted 1 isolated nodes
[Wed Nov  7 11:27:23 2018] output "group.0/wtdbg.assembly/asm.3.dot". Done.
[Wed Nov  7 11:27:23 2018] cut 0 branching nodes
[Wed Nov  7 11:27:23 2018] deleted 0 isolated nodes
[Wed Nov  7 11:27:23 2018] building unitigs
[Wed Nov  7 11:27:23 2018] TOT 41472, CNT 1, AVG 41472, MAX 41472, N50 41472, L50 1, N90 41472, L90 1, Min 41472
[Wed Nov  7 11:27:23 2018] output "group.0/wtdbg.assembly/asm.frg.nodes". Done.
[Wed Nov  7 11:27:23 2018] generating links
[Wed Nov  7 11:27:23 2018] generated 1 links
[Wed Nov  7 11:27:23 2018] output "group.0/wtdbg.assembly/asm.frg.dot". Done.
[Wed Nov  7 11:27:23 2018] rescue 0 weak links
[Wed Nov  7 11:27:23 2018] deleted 2 binary links
[Wed Nov  7 11:27:23 2018] cut 0 transitive links
[Wed Nov  7 11:27:23 2018] remove 0 boomerangs
[Wed Nov  7 11:27:23 2018] detached 0 repeat-associated paths
[Wed Nov  7 11:27:23 2018] remove 0 weak branches
[Wed Nov  7 11:27:23 2018] cut 0 tips
[Wed Nov  7 11:27:23 2018] pop 0 bubbles
[Wed Nov  7 11:27:23 2018] cut 0 tips
[Wed Nov  7 11:27:23 2018] output "group.0/wtdbg.assembly/asm.ctg.dot". Done.
[Wed Nov  7 11:27:23 2018] building contigs
[Wed Nov  7 11:27:23 2018] searched 1 contigs
[Wed Nov  7 11:27:23 2018] Estimated: TOT 41472, CNT 1, AVG 41472, MAX 41472, N50 41472, L50 1, N90 41472, L90 1, Min 41472
[Wed Nov  7 11:27:23 2018] output 1 contigs
[Wed Nov  7 11:27:23 2018] Program Done
** PROC_STAT(TOTAL) **: real 0.430 sec, user 0.340 sec, sys 0.020 sec, maxrss 58340.0 kB, maxvsize 394424.0 kB
---

Here is the read file: WH.reads.fasta.gz

Thanks in advance!

ruanjue commented 6 years ago

Hi,

I tested it using wtdbg2 -t 1, but not hanged.

$> wtdbg2 -h
...
Version: 2.2 (20181101)
Compiled: Sat Nov 3 17:27:34 CST 2018
...

Please tell me which commit in wtdbg2 have you downloaded?

Best, Jue

mrvollger commented 6 years ago

Ahh, I am using 2.1 installed through conda.

WTDBG: De novo assembler for long noisy sequences
Author: Jue Ruan <ruanjue@gmail.com>
Version: 2.1 (20181007)
Compiled: Tue Oct 23 02:33:44 UTC 2018
Usage: wtdbg2 [options]

Will there be an updated conda release soon?

ruanjue commented 6 years ago

I see, there used to have a BUG in bufferedwriter in previous commit, fixed in 2.2.

I will release 2.2 after finish the improvement of consensus module.

mrvollger commented 6 years ago

Great. Thanks!