iqtree / iqtree2

NEW location of IQ-TREE software for efficient phylogenomic software by maximum likelihood http://www.iqtree.org
GNU General Public License v2.0
251 stars 57 forks source link

`-MFP` fails with parition option `-q` #311

Open roblanf opened 3 months ago

roblanf commented 3 months ago

Here are three analyses, quite simple, each is a 3 partition analysis on a small COI alignment. The only difference between the three is the option for how to treat the branch lengths among partitions:

The partition file looks like this:

DNA, codon_1 = 1-1527\3
DNA, codon_2 = 2-1527\3
DNA, codon_3 = 3-1527\3

And the analyses were:

iqt="./iqtree-2.3.6-macOS/bin/iqtree2"
aln="COIalignment.fasta"
threads=3

$iqt -s $aln -q 1_2_3_codon_partition.txt -T $threads -m MFP -redo -safe -pre 1_2_3_identical
$iqt -s $aln -p 1_2_3_codon_partition.txt -T $threads -m MFP -redo -safe -pre 1_2_3_relative 
$iqt -s $aln -Q 1_2_3_codon_partition.txt -T $threads -m MFP -redo -safe -pre 1_2_3_independent

The issue is that the first analysis fails with the following error in ModelFinder:

NOTE: ModelFinder requires 16 MB RAM!
Selecting individual models for 3 charsets using BIC...
ERROR: Fixing branch lengths not supported under specified site rate model

The second and third analyses work fine, e.g. producing output in the .iqtree file like this from ModelFinder:

ModelFinder
-----------

Best-fit model according to BIC: TIM3e+G4:codon_1,F81+F+R2:codon_2,TN+F+R3:codon_3

List of best-fit models per partition:

  ID  Model                  LogL         AIC      w-AIC        AICc     w-AICc         BIC      w-BIC
   1  TIM3e+G4          -1836.803    3683.605 + 3.01e-314    3683.724 - -2.22e+193    3704.767 -        0
   2  F81+F+R2           -802.091    1616.181 + 3.01e-314    1616.349 - -2.22e+193    1641.576 -        0
   3  TN+F+R3           -8605.747   17231.493 + 3.01e-314   17231.935 - -2.22e+193   17273.818 -        0

My expectation was that the first analysis would do something similar, but it seems like there's a sticking point somewhere. My guess is that it has to do with estimating free rate models +RX on partitioned data with identical branch lengths. However, I can't see why this is a problem in theory. Nevertheless, in practice it seems to cause IQ-TREE to fail, with something to do with fixing branch lengths.

roblanf commented 3 months ago

Small update. I am pretty sure this is a bug that can be overcome.

I ran another three analyses, this time with the partition file defining just a single partition, so the only thing I change is the partition file:

DNA, codon_123 = 1-1527

Because of this, all analyses are now in practice identical - all have just a single set of branch lengths, because there's only one partition.

Same three analyses:

$iqt -s $aln -q 123_codon_partition.txt -T $threads -m MFP -redo -pre 123_identical
$iqt -s $aln -p 123_codon_partition.txt -T $threads -m MFP -redo -pre 123_relative
$iqt -s $aln -Q 123_codon_partition.txt -T $threads -m MFP -redo -pre 123_independent

The first analysis with -p (identical branch lengths) fails with the same error:

ERROR: Fixing branch lengths not supported under specified site rate model

The other two work, and give identical models (and near identical likelihoods, as expected) if I fix the -seed to the same thing and run it on one thread. They also (also as expected) give the same answer as running without a partition file.

E.g. -p (relative brlens) gives:

ModelFinder
-----------

Best-fit model according to BIC: GTR+F+I+G4:codon_123

List of best-fit models per partition:

  ID  Model                  LogL         AIC      w-AIC        AICc     w-AICc         BIC      w-BIC
   1  GTR+F+I+G4       -12305.824   24633.647 + 3.03e-314   24633.822 -      216   24692.289 -        0

So, the mystery is why the first command doesn't work:

$iqt -s $aln -q 123_codon_partition.txt -T $threads -m MFP -redo -pre 123_identical
thomaskf commented 2 months ago

@roblanf Thank you for reporting the issue, and sorry for the delayed response. The problem arises from the incompatibility between the "-q" option (which sets all partitions to share the same branch lengths) and the free-rate RHAS model (+R). The optimization method for the free-rate model requires flexibility in branch lengths, making this issue not easy to resolve. I will discuss this with Minh to explore potential solutions.

In the meantime, please use the "-m TEST" option, which excludes the +R model.