much worse assembly N50 in new version of wtdbg2

ruanjue / wtdbg2

Redbean: A fuzzy Bruijn graph approach to long noisy reads assembly

GNU General Public License v3.0

513 stars 94 forks source link

much worse assembly N50 in new version of wtdbg2 #47

Closed ishengtsai closed 5 years ago

ishengtsai commented 6 years ago

Hi,

I have around 30X of an 1.5Gb insect genome. When trying to reassemble using the latest version I have much worse N50 and bigger genome. I was wondering if anyone can comment on the parameters that I should tweak please? Thanks.

Version: 1.1.006 Assembly Size 1.671Gb N50 3Mb Largest contig 16.1Mb

Version 2.2 Assembly Size 2.237Gb N50 132kb Largest contig 1.8Mb

parameters: -p19 -AS2 -e2 in both cases.

ruanjue commented 6 years ago

What kind of sequencing techonology did you use?

ishengtsai commented 6 years ago

Sorry! I forgot to mention. Nanopore 1D reads.

ruanjue commented 6 years ago

Could you send the log file of wtdbg2.2 to my email ruanjue AT gmail.com

ruanjue commented 6 years ago

Thanks for suggestions.

There were three major differences between two runs, wtdbg-2.1 and wtdbg-2.2.

1) '-K 1000' in 2.1, while '-K 23368' in 2.2. It was caused by wtdbg2.2 set default cutoff to ">= 1000 and >= (1 - 0.05)". It is really a problem, I will try to find a good way to set the cutoff. Now, please set '-K 1000.0'.

2) 2.1 used 9,118,484 data, while 2.2 used 7,094,607 reads. You had input the same filenames. Let's check what wrong with it.

3) 2.2 will delete multiple alignments between two reads, while 2.1 won't. I will provide a option to control it.

Best, Jue

ishengtsai commented 6 years ago

Many thanks.

Just checked - there were a total of 9,118,419 reads, so I suspect the slightly higher number of reads in 2.1 was due to splitting of ultra long reads, while 2.2 really excluded a lot of shorter reads?

Does option 3 make the assembly quicker but less contiguated?

ruanjue commented 6 years ago

2) the total basepairs differed much, 2.1 took 79 Gb, 2.2 took 62 Gb. Could you have a try on one or two input files? I am not sure why the total bases differed.

3) Not quicker, but bring more continguity in my tests.

ishengtsai commented 6 years ago

There were a total of 10 fastq files, do you mean to let me try using one merged fastq file?
So to test this option, I would set -K 1000.0 and rerun v2.2?

ruanjue commented 6 years ago

2) No merged, but one or two of them, to see the difference.

3) I haven't updated it on GitHub.