Performance improvement

ruanjue / smartdenovo

Ultra-fast de novo assembler using long noisy reads

GNU General Public License v3.0

129 stars 29 forks source link

Performance improvement #15

Closed tangerzhang closed 6 years ago

tangerzhang commented 6 years ago

Hi Ruan, Thanks for providing this fast assembly program. I am using smartdenovo to assemble an insect genome (350 Mb estimated genome size but expected highly heterozygous), with 170 X nanopore raw reads. The first round of smartdenovo resulted in a 672 Mb genome assembly with N50 240 Kb. I am wondering which parameters should I tune to improve the assembly? In addition, I got 34.4 Mb sequences in prefix.dmo.cns file, which is far less the estimated genome size. Is there anything I did wrong? Looking forward to your reply!

ruanjue commented 6 years ago

Hard to say what exactly happened. Possible ways to fix it: 1, check your log file, is there anything wrong, like segfault 2, try polish raw reads using canu, and then assemble them by SMARTdenovo

Nanopore reads are likely to have less k-mer matched, the default parameters were trained for PacBio RSII dataset three years ago. I suggest you to try to use -k 15 to get the alignments again. I haven't run SMARTdenovo on nanopore data yet, if it has good luck, please be kind to tell me.

Best, Jue

tangerzhang commented 6 years ago

Thanks for your quick response.

I did not see any error message reported in the log file.
are you suggesting to use CANU corrected reads or trimmed reads?
will try -k 15 and keep you posted

ruanjue commented 6 years ago

RE 2: yes, try CANU

tangerzhang commented 6 years ago

I think I have solved the problem. I used CANU corrected reads and tried the two k values: K=15 and K16, leading to two quite different assemblies, 172 Mb and 320 Mb, respectively. K16 assembly is comparable to estimated genome size (350 Mb) and BUSCO analysis showed it is more reasonable. I will take -k 16. Thanks again for your suggestion.