ruanjue / smartdenovo

Ultra-fast de novo assembler using long noisy reads
GNU General Public License v3.0
129 stars 29 forks source link

How to run wtzmo parallelly #33

Closed Jolvii85 closed 5 years ago

Jolvii85 commented 5 years ago

Hi Ruanjue

I am assembling a big genome (>20Gbp), I would like to run wtzmo parallelly, i.e. 10 parts, can you give an example about how to set parameters? And if I run wtzmo parallelly, will the memory usage reduce?

Thank you very much.

ruanjue commented 5 years ago
wtzmo -P 10 -p 0 ...
wtzmo -P 10 -p 1 ...
...
wtzmo -P 10 -p 9

If run parallely, each job will take the same memory with standalone.

As you are assembling 20G genome, please have a try with wtdbg2 (v2.3), which is designed to handle huge genomes.

Jue

Jolvii85 commented 5 years ago

Hi, I use a small genome to test the wtzmo command. I combined the results of all wtzmo results, then run wtclp, the row number of obt file i got from here is the same with that non-parallelly run, but there are some disaccordance results (after sort each obt file), is it ok? there are 4300 lines, 16 are different (See below).

I test the whole pipeline, I found that if I parallelly run the wtzmo, the length of the assembly is similar with that of the non-parallelly run, but the no. of the scaffold is a little higher.

3136c3136 \< pb000000003137 0 0 46937 0 0 1 --- > pb000000003137 0 46937 46937 0 46937 0 3433c3433 \< pb000000003434 0 32304 32304 0 32304 0 --- > pb000000003434 0 0 32304 0 0 1 3557c3557 \< pb000000003558 0 0 66886 0 0 1 --- > pb000000003558 0 66886 66886 0 66886 0 3764c3764 < pb000000003765 0 43177 43177 0 43177 0 --- > pb000000003765 0 0 43177 0 0 1 4001c4001 \< pb000000004002 0 30644 30644 0 30644 0 --- > pb000000004002 0 0 30644 0 0 1 4018c4018 \< pb000000004019 0 31422 31422 0 31422 0 --- > pb000000004019 0 0 31422 0 0 3 4044c4044 \< pb000000004045 0 30629 30629 0 30629 0 --- > pb000000004045 0 0 30629 0 0 2

ruanjue commented 5 years ago

It is ok, spliting reads will change the k-mer table, and result in some difference.