Closed shunhuahan closed 4 years ago
wtdbg2 calls n
cores to perform reads alignment, if provied -A
option, the result will be the same whatever the n
is. But if no -A
, wtdbg2 will skip contained reads. Suppose there are two reads A and B, A contains B, in multiple cores, the skipping of B depends on whether the other core finish A's alignments, thus uncertained.
Thanks for your explanation!
I tried again adding -A
for both jobs, and I got the same issue with 4 cores generated one contig while 1 core generated 0 contig.
Here are the stdout and output files using 4 cores.
test1$ wtdbg2 -x rs -g 30k -A -t 4 -i chr2L_7202325_7204081.reads.fa -fo chr2L_7202325_7204081
--
-- total memory 263724796.0 kB
-- available 246639928.0 kB
-- 28 cores
-- Starting program: wtdbg2 -x rs -g 30k -A -t 4 -i chr2L_7202325_7204081.reads.fa -fo chr2L_7202325_7204081
-- pid 188802
-- date Mon Mar 23 21:16:34 2020
--
[Mon Mar 23 21:16:34 2020] loading reads
21 reads
[Mon Mar 23 21:16:34 2020] Done, 21 reads (>=5000 bp), 395344 bp, 1535 bins
** PROC_STAT(0) **: real 0.001 sec, user 0.000 sec, sys 0.000 sec, maxrss 1052.0 kB, maxvsize 80068.0 kB
[Mon Mar 23 21:16:34 2020] Set --edge-cov to 2
KEY PARAMETERS: -k 0 -p 21 -K 1000.049988 -A -S 4.000000 -s 0.050000 -g 30000 -X 50.000000 -e 2 -L 5000
[Mon Mar 23 21:16:34 2020] generating nodes, 4 threads
[Mon Mar 23 21:16:34 2020] indexing bins[(0,1535)/1535] (392960/392960 bp), 4 threads
[Mon Mar 23 21:16:34 2020] - scanning kmers (K0P21S4.00) from 1535 bins
1535 bins
********************** Kmer Frequency **********************
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|||||| |
********************** 1 - 201 **********************
Quatiles:
10% 20% 30% 40% 50% 60% 70% 80% 90% 95%
1 1 1 1 1 1 1 1 2 4
** PROC_STAT(0) **: real 0.001 sec, user 0.000 sec, sys 0.000 sec, maxrss 1052.0 kB, maxvsize 80068.0 kB
[Mon Mar 23 21:16:34 2020] - high frequency kmer depth is set to 1000
[Mon Mar 23 21:16:34 2020] - Total kmers = 56675
[Mon Mar 23 21:16:34 2020] - average kmer depth = 3
[Mon Mar 23 21:16:34 2020] - 53969 low frequency kmers (<2)
[Mon Mar 23 21:16:34 2020] - 0 high frequency kmers (>1000)
[Mon Mar 23 21:16:34 2020] - indexing 2706 kmers, 8610 instances (at most)
1535 bins
[Mon Mar 23 21:16:34 2020] - indexed 2706 kmers, 8603 instances
[Mon Mar 23 21:16:34 2020] - masked 725 bins as closed
[Mon Mar 23 21:16:34 2020] - sorting
** PROC_STAT(0) **: real 0.001 sec, user 0.000 sec, sys 0.000 sec, maxrss 1052.0 kB, maxvsize 80068.0 kB
[Mon Mar 23 21:16:34 2020] Done
20 reads|total hits 136
** PROC_STAT(0) **: real 0.101 sec, user 0.100 sec, sys 0.040 sec, maxrss 68600.0 kB, maxvsize 411140.0 kB
[Mon Mar 23 21:16:34 2020] sorting rdhits ... Done
[Mon Mar 23 21:16:34 2020] clipping ... 72.18% bases
[Mon Mar 23 21:16:34 2020] generating regs ... 1508
[Mon Mar 23 21:16:34 2020] sorting regs ... Done
[Mon Mar 23 21:16:34 2020] generating intervals ... 206 intervals
[Mon Mar 23 21:16:34 2020] selecting important intervals from 206 intervals
[Mon Mar 23 21:16:34 2020] Intervals: kept 24, discarded 182
** PROC_STAT(0) **: real 0.101 sec, user 0.100 sec, sys 0.040 sec, maxrss 68600.0 kB, maxvsize 411140.0 kB
[Mon Mar 23 21:16:34 2020] Done, 24 nodes
[Mon Mar 23 21:16:34 2020] output "chr2L_7202325_7204081.1.nodes". Done.
[Mon Mar 23 21:16:34 2020] median node depth = 5
[Mon Mar 23 21:16:34 2020] masked 0 high coverage nodes (>200 or <2)
[Mon Mar 23 21:16:34 2020] masked 0 repeat-like nodes by local subgraph analysis
[Mon Mar 23 21:16:34 2020] generating edges
[Mon Mar 23 21:16:34 2020] Done, 56 edges
[Mon Mar 23 21:16:34 2020] output "chr2L_7202325_7204081.1.reads". Done.
[Mon Mar 23 21:16:34 2020] output "chr2L_7202325_7204081.1.dot.gz". Done.
[Mon Mar 23 21:16:34 2020] graph clean
[Mon Mar 23 21:16:34 2020] rescued 0 low cov edges
[Mon Mar 23 21:16:34 2020] deleted 0 binary edges
[Mon Mar 23 21:16:34 2020] deleted 13 isolated nodes
[Mon Mar 23 21:16:34 2020] cut 1 transitive edges
[Mon Mar 23 21:16:34 2020] output "chr2L_7202325_7204081.2.dot.gz". Done.
[Mon Mar 23 21:16:34 2020] 0 bubbles; 0 tips; 0 yarns; rescued 1 high edges
[Mon Mar 23 21:16:34 2020] deleted 1 isolated nodes
[Mon Mar 23 21:16:34 2020] output "chr2L_7202325_7204081.3.dot.gz". Done.
[Mon Mar 23 21:16:34 2020] cut 0 branching nodes
[Mon Mar 23 21:16:34 2020] deleted 0 isolated nodes
[Mon Mar 23 21:16:34 2020] building unitigs
[Mon Mar 23 21:16:34 2020] TOT 12544, CNT 1, AVG 12544, MAX 12544, N50 12544, L50 1, N90 12544, L90 1, Min 12544
[Mon Mar 23 21:16:34 2020] output "chr2L_7202325_7204081.frg.nodes". Done.
[Mon Mar 23 21:16:34 2020] generating links
[Mon Mar 23 21:16:34 2020] generated 1 links
[Mon Mar 23 21:16:34 2020] output "chr2L_7202325_7204081.frg.dot.gz". Done.
[Mon Mar 23 21:16:34 2020] rescue 0 weak links
[Mon Mar 23 21:16:34 2020] deleted 2 binary links
[Mon Mar 23 21:16:34 2020] cut 0 transitive links
[Mon Mar 23 21:16:34 2020] remove 0 boomerangs
[Mon Mar 23 21:16:34 2020] remove 0 weak branches
[Mon Mar 23 21:16:34 2020] cut 0 tips
[Mon Mar 23 21:16:34 2020] pop 0 bubbles
[Mon Mar 23 21:16:34 2020] detached 0 repeat-associated paths
[Mon Mar 23 21:16:34 2020] cut 0 tips
[Mon Mar 23 21:16:34 2020] output "chr2L_7202325_7204081.ctg.dot.gz". Done.
[Mon Mar 23 21:16:34 2020] building contigs
[Mon Mar 23 21:16:34 2020] searched 1 contigs
[Mon Mar 23 21:16:34 2020] Estimated: TOT 12544, CNT 1, AVG 12544, MAX 12544, N50 12544, L50 1, N90 12544, L90 1, Min 12544
[Mon Mar 23 21:16:34 2020] output 1 contigs
[Mon Mar 23 21:16:34 2020] Program Done
** PROC_STAT(TOTAL) **: real 0.201 sec, user 0.200 sec, sys 0.060 sec, maxrss 68600.0 kB, maxvsize 468080.0 kB
---
total 1.2M 388K -rw-r--r--. 1 sh60271 cmblab 387K Mar 23 21:16 chr2L_7202325_7204081.reads.fa 632K -rw-r--r--. 1 sh60271 cmblab 630K Mar 23 21:16 chr2L_7202325_7204081.kmerdep 4.0K -rw-r--r--. 1 sh60271 cmblab 93 Mar 23 21:16 chr2L_7202325_7204081.frg.nodes 4.0K -rw-r--r--. 1 sh60271 cmblab 198 Mar 23 21:16 chr2L_7202325_7204081.frg.dot.gz 4.0K -rw-r--r--. 1 sh60271 cmblab 274 Mar 23 21:16 chr2L_7202325_7204081.events 52K -rw-r--r--. 1 sh60271 cmblab 51K Mar 23 21:16 chr2L_7202325_7204081.ctg.lay.gz 4.0K -rw-r--r--. 1 sh60271 cmblab 154 Mar 23 21:16 chr2L_7202325_7204081.ctg.dot.gz 4.0K -rw-r--r--. 1 sh60271 cmblab 702 Mar 23 21:16 chr2L_7202325_7204081.clps 4.0K -rw-r--r--. 1 sh60271 cmblab 537 Mar 23 21:16 chr2L_7202325_7204081.binkmer 8.0K -rw-r--r--. 1 sh60271 cmblab 4.2K Mar 23 21:16 chr2L_7202325_7204081.alignments.gz 4.0K -rw-r--r--. 1 sh60271 cmblab 369 Mar 23 21:16 chr2L_7202325_7204081.3.dot.gz 4.0K -rw-r--r--. 1 sh60271 cmblab 391 Mar 23 21:16 chr2L_7202325_7204081.2.dot.gz 4.0K -rw-r--r--. 1 sh60271 cmblab 2.4K Mar 23 21:16 chr2L_7202325_7204081.1.reads 8.0K -rw-r--r--. 1 sh60271 cmblab 4.7K Mar 23 21:16 chr2L_7202325_7204081.1.nodes 4.0K -rw-r--r--. 1 sh60271 cmblab 1.1K Mar 23 21:16 chr2L_7202325_7204081.1.dot.gz
- And here are the stdout and output files using 1 core.
[Mon Mar 23 21:18:38 2020] loading reads 21 reads [Mon Mar 23 21:18:38 2020] Done, 21 reads (>=5000 bp), 395344 bp, 1535 bins PROC_STAT(0) : real 0.001 sec, user 0.000 sec, sys 0.000 sec, maxrss 1048.0 kB, maxvsize 80068.0 kB [Mon Mar 23 21:18:38 2020] Set --edge-cov to 2 KEY PARAMETERS: -k 0 -p 21 -K 1000.049988 -A -S 4.000000 -s 0.050000 -g 30000 -X 50.000000 -e 2 -L 5000 [Mon Mar 23 21:18:38 2020] generating nodes, 1 threads [Mon Mar 23 21:18:38 2020] indexing bins[(0,1535)/1535] (392960/392960 bp), 1 threads [Mon Mar 23 21:18:38 2020] - scanning kmers (K0P21S4.00) from 1535 bins 1535 bins ** Kmer Frequency ** |
---|
total 1.1M 388K -rw-r--r--. 1 sh60271 cmblab 387K Mar 23 21:16 chr2L_7202325_7204081.reads.fa 632K -rw-r--r--. 1 sh60271 cmblab 630K Mar 23 21:18 chr2L_7202325_7204081.kmerdep 0 -rw-r--r--. 1 sh60271 cmblab 0 Mar 23 21:18 chr2L_7202325_7204081.frg.nodes 4.0K -rw-r--r--. 1 sh60271 cmblab 72 Mar 23 21:18 chr2L_7202325_7204081.frg.dot.gz 0 -rw-r--r--. 1 sh60271 cmblab 0 Mar 23 21:18 chr2L_7202325_7204081.events 4.0K -rw-r--r--. 1 sh60271 cmblab 72 Mar 23 21:18 chr2L_7202325_7204081.ctg.dot.gz 4.0K -rw-r--r--. 1 sh60271 cmblab 690 Mar 23 21:18 chr2L_7202325_7204081.clps 4.0K -rw-r--r--. 1 sh60271 cmblab 537 Mar 23 21:18 chr2L_7202325_7204081.binkmer 8.0K -rw-r--r--. 1 sh60271 cmblab 4.2K Mar 23 21:18 chr2L_7202325_7204081.alignments.gz 4.0K -rw-r--r--. 1 sh60271 cmblab 72 Mar 23 21:18 chr2L_7202325_7204081.3.dot.gz 4.0K -rw-r--r--. 1 sh60271 cmblab 72 Mar 23 21:18 chr2L_7202325_7204081.2.dot.gz 4.0K -rw-r--r--. 1 sh60271 cmblab 567 Mar 23 21:18 chr2L_7202325_7204081.1.reads 0 -rw-r--r--. 1 sh60271 cmblab 0 Mar 23 21:18 chr2L_7202325_7204081.1.nodes 4.0K -rw-r--r--. 1 sh60271 cmblab 72 Mar 23 21:18 chr2L_7202325_7204081.1.dot.gz 4.0K -rw-r--r--. 1 sh60271 cmblab 30 Mar 23 21:18 chr2L_7202325_7204081.ctg.lay.gz
- I assume the expected behavior would be same number of contig if I add `-A` according to your explanation, is there something else that I missed?
Thanks!
Thanks for the report, there seems a BUG at read clipping. Could you send me the test data?
ruanjue.big@qq.com
Just sent, thanks a lot!
I tested this version and now it produced identical results no matter how many cores I used. Thank you so much!
Btw, I'm wondering if you would consider making a new release for wtdbg2. I can help update the package in conda. Thank you!
Thanks for the suggestion. I still need to fix more bugs to release a new version.
Got it, look forward to the future release!
Hi,
Thanks for making this great tool! I was using wtdbg2 for some local assemblies (rs2 data) with estimated genome size for each assembly job to be ~30kb. Also, each assembly should contain transposable element in the middle (<10kb). Looks like wtdbg2 would derive empty contig in quite a few cases. I might need to tweak some parameters for this small genome size, but what surprised me is that
wtdbg2
outputted different number of contigs when using single versus multiple cores. Examples are shown below:Here I used four cores for the assembly job (21 reads in total), the output is one contig (~12kb).
Now when I switch to use one core for the same assembly job, I got 0 contig.
In addition, when I switched to use
preset2
instead ofrs
, I was able to get a small contig for one core (~5kb).My questions are: how does wtdbg2 use multithreading for the assembly, do you have any guess on why wtdbg2 behaves differently under different number of threads for local assembly, and do you have any suggestions on what I should do to avoid assembly instability? I read the relevant issue in https://github.com/ruanjue/wtdbg2/issues/61 but it doesn’t seem to help.
Thank you for your time! Shunhua