Closed pi3rrr3 closed 4 years ago
It looks like a very high repetitve genome. The first step is looking at the CPU usage by top
, whether wtdbg2 takes nearly all 40 cores. The next step is re-run wtdbg2 by increasing k-mer size to get it finish faster, I suggest -x sq -k 0 -p 19
.
Thanks for the quick answer, I am trying these parameters now and will let you know.
Also, would increasing -L
help? I have 30X coverage with reads over 12kb.
Firstly try increase kmer-size, wtdbg2 automaticly select 50X data from input reads.
do we need to set -X if we have more than 50X data?
The default parameters is -X 50
. Please have a look at the usage.
-X <float>, --rdcov-cutoff <float>
Default: 50.0. Retaining 50.0 folds of genome coverage ...
Hi,
I try to use the latest version of wtdbg2 for assembling a ~300 Mb insect genome from 70X PacBio Sequel data. Installation was flawless, just outputted the following:
I am using the following command, using 40 cores and 256 Gb of RAM:
~/tools/wtdbg2/wtdbg2 -x sq -g 300m -L 5000 -i raw/fasta/combined_reads.fa.gz -t 40 -fo assembly/dbg_l5k_par
Kmer indexing step went fine (see output below), but the alignment step has been running for almost four days now. The .alignments file size keeps growing (1.3 Gb at the moment).
Any idea what might be going wrong? Thanks for your help, and for the fine piece of software!