iqtree / iqtree2

NEW location of IQ-TREE software for efficient phylogenomic software by maximum likelihood http://www.iqtree.org
GNU General Public License v2.0
231 stars 55 forks source link

Description of -fast option #239

Closed ebraun68 closed 2 months ago

ebraun68 commented 2 months ago

I was curious about the -fast option. The only description I can find in the manual is this:

Turn on the fast tree search mode, where IQ-TREE will just construct two starting trees: maximum parsimony and BIONJ, which are then optimized by nearest neighbor interchange (NNI). Introduced in version 1.6.

I have a few questions about the details of this search:

1) Does IQ-TREE do NNIs on both starting trees, or does it check the likelihood of the MP and BIONJ trees and then choose one of those trees for further branch swapping?

2) How stochastic is this feature given different runs (or different random number seeds)? Is stochasticity coming in for the choice of MP starting tree, in the order of the NNIs, or both?

3) Can a -fast search be combined with a user starting tree?

I apologize if I missed any of this when I tried to find it in the manual

bqminh commented 2 months ago

Thanks for asking, we should have documented it more clearly. This example command:

iqtree2 -s example.phy -m GTR+G -fast

will do the following steps (with comparison to a normal search when necessary):

  1. Load the alignment and skip sequence composition test of a normal search.
  2. Construct a parsimony tree with stepwise addition + SPR search and initialise candidate tree set with this one tree.
  3. Estimate model parameters (GTR+G in this case) on the parsimony tree with an optimisation epsilon 0.5 (normal: 0.1).
  4. Compute pairwise distances based on the model and a NJ tree from the distance matrix.
  5. If NJ tree is different from parsimony tree, put it into the candidate tree set.
  6. For each tree in the candidate set (which has 1 or 2 trees): 6.1. Optimize it from NNI 6.2. If it has higher likelihood than the best tree, reestimate model parameters with epsilon 0.5 (normal: 0.1) and add it to the candidate set.
  7. Reestimate model parameters on the best tree in the candidate set with epsilon 0.05 (normal: 0.01).
  8. Report the best tree and model parameters as output of IQ-TREE.

Now your questions:

Q1. NNI is done both both starting trees, according to this workflow. Q2. Stochasticity is only happening in step 2. Q3. If you supply a starting tree via -t TREE_FILE option: Step 2 is changed to using this tree and steps 4-5 won't do the NJ tree to keep the candidate set to have this one tree. The remaining steps stay the same.