iqtree / iqtree2

NEW location of IQ-TREE software for efficient phylogenomic software by maximum likelihood http://www.iqtree.org
GNU General Public License v2.0
240 stars 56 forks source link

Better? parallelisation for MF2 #19

Open roblanf opened 3 years ago

roblanf commented 3 years ago

Currently we parallelise MF2 by sending each subset to its own thread. This is OK, but we will often miss out on a lot of potential efficiency.

E.g. imagine we have 100 available processors, and we're analysing a dataset with 10 data blocks and we want to fit 100 models to each data block.

Currently we can only use 10 threads for this, so we can only get maximum 10% efficiency.

We can refactor the parallelisation to speed this up though, as follows.

  1. Start by using 10 threads to estimate the most complex model for each data block. This is obviously limiting, but since it's only one model per data block it will also be quick.
  2. Use the result from step 1 to set the initial parameters of the models for all of the less complex models, as we currently do.
  3. Create a job queue that includes all of the remaining 99 models from each of the 10 data blocks (990 jobs), with parameters initialised from step 2 (maybe this is where there are limitations, e.g. if you use previous estimates from free-rate models to initialise other free rate models, but in this case it might just require that the jobs are sometimes packaged into related subsets, e.g. all the free-rate jobs go to one processor because they need to build on each other)
  4. Order the job queue roughly according to how long we think each job will take
  5. Run the jobs

In this example we can go from maximum 10% efficiency to a maximum of 99% efficiency.

In terms of ordering jobs, maybe there's already something in IQ-TREE to estimate execution time for a job. If not, then in PF2 we use a really crude estimates based on gut feelings for how long different types of model tend to take to optimise. E.g. JC is faster than GTR. GTR is faster than GTR+I and GTR+G, and those are all a lot faster than GTR+I+G, with GTR+Rx models the slowest.

roblanf commented 2 years ago

OK, we are shelving this until we know if a sensible default is to use modelrevelator or just GTR models for the merging.

Changing the parallelisation in MF2 is only smart if we end up wanting to calculate liklihoods for a lot of models for every partition.

Thernn88 commented 2 years ago

It would also be smart to do the threading test AFTER ModelFinder is performed. Otherwise if you set threading to auto it will do the test, and pick some silly number like 3 threads which vastly slows down your model finder. Modeling scaled linearly with cores on tests I did with my 3990x.