ruanjue / smartdenovo

Ultra-fast de novo assembler using long noisy reads
GNU General Public License v3.0
127 stars 29 forks source link

missing one entire chromosome in the final assembly #19

Closed yjx1217 closed 6 years ago

yjx1217 commented 6 years ago

Hello ruanjue,

I have been actively testing smartdenovo using our own ONT data (on S.cerevisiae with 200X coverage) recently. The final assembly looks good in general except one of the chromosome (chrIII) is completely missing in the final assembly. All the other chromosomes look good even for the most difficult one, chrXII. I obtained the same results with multiple independent runs on different machines, so I guess this must be related with smartdenovo. So I was wondering if you have some suggestions about this. I can share my input reads for your testing if you can send me your email address. Another observation is that smartdenovo seems to consume quite a lot memory in certain intermediate steps. Do you have some suggestions about this? Thanks in advance!

Best, Jia-Xing

ruanjue commented 6 years ago

Dear Jia-Xing,

Missing a whole chromosome sequence is very rare. Please note that the final assemblies is full of sequencing errors, still needing to be polished by other program. It may bring missing in alignment. Also check your reference sequences. If all fine, please check the number of contigs in .lay file and the final cns file. It may be lost in wtcns.

About memory, do you mean wtzmo? `? -G 10 will use about 1/10 RAM to build kmer-index, have a try. -S 8 will use 1/2 RAM of -S 4. Have you tried wtdbg-1.2.8? it is more memory-efficient.

Best, Jue

yjx1217 commented 6 years ago

Dear Jue,

Thanks for the quick reply!

About the missing chromosome, I've also checked the unitgs in the *.dmo.lay.utg file and chrIII was missing there as well. In comparison, another assembler that I tried, flye, has no problem in recovering that chromosome, so I guess this might be a smartdenovo-specific problem. Again, I can share my testing data if it can help the further development of smartdenovo.

About memory, I used the provided wrapper smartdenovo.pl for my run:

smartdenovo.pl -p $prefix -t $threads -c 1 ./../$reads > $prefix.mak make -f $prefix.mak

And yes, I think wtzmo is memory consuming. It will be great if you can further introduce some of these additional parameters into this wrapper to enable more flexible RAM consumption control.

Best, Jia-Xing

ruanjue commented 6 years ago

Ok, please share the dataset with me (ruanjue.big AT qq.com).

ruanjue commented 6 years ago

There was a long read spanning whole chromosome of chrIII. Other reads were contained and discarded in assembly graph. But one read cannot be a contig in the definition of SMARTdenovo. One way to avoid it is to remove the super-long read.