veg / hyphy

HyPhy: Hypothesis testing using Phylogenies
http://www.hyphy.org
Other
205 stars 69 forks source link

Question About BUSTED Omega Rate Class & Synonymous Rate Class #1570

Closed gykoh closed 1 year ago

gykoh commented 1 year ago

Hello!

I am exploring the effect of parameters in BUSTED *omega rate classes and synonymous rate classes). I am trying to compare one run with 3 for both settings and one run with 2 for both settings. How should I compare them and make a decision about what’s the ideal setting for my dataset?

Based on AIC-c scores, it seems that the 2nd run is the “better fit-model.” The 1st run has AIC-c = 12058.16 (82 estimated parameters) while the 2nd run has AIC-c = 12051.65 (78 estimated parameters). The 2nd run does not have any note of “Collapsed rate class”, but the 1st run does constrained model fit table.

Would this mean 2 synonymous rate classes and 2 omega rate classes are a “better” setting for my data rather than the default setting of 3 synonymous rate classes and 3 omega rate classes?

Thank you!

Settings For 1st Run

spond commented 1 year ago

Dear @gykoh,

Based on your description, the 2x2 model does offer a better fit to the data, as indicated by a lower (better) c-AIC score. If your primary interest is to detect selection, then you should be guided by the stability of the p-value: does the conclusion of the analysis depend significantly on how many rate classes you selected. Hopefully, there is not much difference between 2x2 and 3x3 models, which means that the choice of that number is not very important.

If you are also interested in inferring the underlying rate distributions (including their complexity, i.e. the number of rate classes), then 2x2 is preferred to 3x3. I would also consider 2x3 and 3x2 models to look at other plausible alternatives.

Best, Sergei

gykoh commented 1 year ago

Thank you Professor Pond for the answer!