stephaneguindon / phyml

PhyML -- Phylogenetic estimation using (Maximum) Likelihood
GNU General Public License v3.0
176 stars 61 forks source link

How can I determine the optimal number of substitution rate categories to use? #197

Open liamxg opened 2 months ago

liamxg commented 2 months ago

Dear @stephaneguindon,

How can I determine the optimal number of substitution rate categories to use? Thanks.

stephaneguindon commented 2 months ago

The best here is to use the FreeRate model (--freerates option of the command line) combined with the -c X option (with X the number of rate classes). You can then compare a model with X classes to another one with X+1 classes using a likelihood ratio test. The distribution of the likelihood ratio statistic is distributed according to a chi-square distribution with 2 degrees of freedom (although a model with X classes may derive from X+1 classes by either setting two rates to be equal and/or one class frequency to zero, making the chi-square distribution perhaps too conservative).

liamxg commented 2 months ago

Thanks, but could you please tell me how many times should I test to find the best X? @stephaneguindon

liamxg commented 1 month ago

Dear @stephaneguindon,

how to change Compute approximate likelihood ratio test: option in command line?

Best, Liam