ndierckx / NOVOPlasty

NOVOPlasty - The organelle assembler and heteroplasmy caller
Other
174 stars 63 forks source link

Question about speed of assembly #160

Open RyanGawryluk opened 3 years ago

RyanGawryluk commented 3 years ago

Hi,

Firstly, I'm really appreciating this software, so thanks a lot for developing it!

I have a question about large variations in speed of NOVOPlasty that I'm experiencing. I am assembling the mitochondrial genomes of several novel microbes, at ~60-70 kbp, from PE Illumina 150 x 2 reads. In each case, I normalized the read sets with bbnorm to make the datasets more manageable, and they now have a total of ~60 million pairs for each species.

For the first species, I ran NOVOPlasty, and it figured out a nice, circular mtDNA in about 20 minutes. For the next species, it has been running for > 24 hours, with no end in sight. In the log file of each, I set a kmer length of 31, and a max memory of 15 g (though from the 'top' command, it looks to be using more like 100g). These are likely fairly similar mtDNAs and datasets sizes; what could be causing the huge difference in run time, and what's the best way of getting around this?

Thanks!

ndierckx commented 3 years ago

Hi,

If it takes that long it could be a bug... Could you set extended log to 1 in the config to 1 and run it again. And send me that file

Do you see the length of the assembled sequence increasing?

And make sure you use the latest version..

RyanGawryluk commented 3 years ago

Ok, thanks I will update to the latest version and run it again with the extended log file.

I did see the length increasing initially (to 96,410 bp, a bit higher than expected). But since then, I can't tell that anything is really happening.

ndierckx commented 3 years ago

After it gets to that length it got stuck because of a bug, so you can terminate the assembly, it will not finish. You can also terminate it after the length stops increasing for the extended log run