snayfach / MicrobeCensus

MicrobeCensus estimates the average genome size of microbial communities from metagenomic data
http://genomebiology.com/2015/16/1/51
GNU General Public License v3.0
41 stars 16 forks source link

MicrobeCensus should use requested # of cores #9

Closed taltman closed 8 years ago

taltman commented 9 years ago

Even when specifying that MC use 16 cores, by monitoring with 'top' I see that it never utilizes more than two or three cores at a time. It took 3+ hours to process a 2.3 Gbyte read file, sampling all reads. It would have been much faster if it were able to fully exploit the available horsepower.

snayfach commented 8 years ago

The specified number of cores is passed to RAPsearch2 during the alignment step. I've found that RAPsearch2 does go faster with more cores, but the increase in speed is not linear - you get a 60% increase going from 1 to 2 cores, but only a 10% increase going from 4 to 8 cores. So this is really an issue with RAPsearch2 that cannot be solved in this package.

The best way to increase speed is to use only a subset of reads for AGS estimation. The default is 1 million reads, which should work well for most communities. Even using as many as 5 million reads will still be a big increase in speed relative to the entire dataset.