lamho86 / phylolm

GNU General Public License v2.0
30 stars 12 forks source link

trouble interpreting alpha #69

Closed reillybren closed 3 months ago

reillybren commented 3 months ago

Hi, I am new to using phylolm and statistical methods for phylogenetic regressions in general. I am running into some trouble trying to interpret alpha for my model.

I have a phylogeny with 48 species, a continuous predictor variable (scales 0-1) and a binary outcome (7 1s and 41 0s which may be biasing the results). When I used 'phyloglm' this is the output I get:

phyloglm(formula = X ~ closnesscentrality, data = cc, phy = tre, boot = 100) AIC logLik Pen.logLik 23.295 -8.648 -9.666

Method: logistic_MPLE Mean tip height: 40.36665 Parameter estimate(s): alpha: 0.0004625733 bootstrap mean: 0.002489067 (on log scale, then back transformed) so possible upward bias. bootstrap 95% CI: (0.0004543538,0.9567479)

Coefficients: Estimate StdErr z.value lowerbootCI upperbootCI p.value
(Intercept) -6.2937 2.3342 -2.6962 -13.8907 -3.4264 0.007013 closnesscentrality 9.9426 3.1622 3.1442 5.9304 22.6024 0.001666

Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1

Note: Wald-type p-values for coefficients, conditional on alpha=0.0004625733 Parametric bootstrap results based on 100 fitted replicates

I have read two different ways to interpret alpha, one as log(alpha) where closer to -4 is indicating no (?) phylogenetic signal and then also using t1/2= ln(2)/alpha where t1/2 is interpreted relative to tree size. t1/2 is significantly larger than my tree height, which would suggest little or no phylogenetic signal? Which interpretation is more appropriate? Would this suggest the outcome variable does not have phylogenetic signal at all?

cecileane commented 3 months ago

The best way to interpret α is to calculate the 'half-life' like you did: ln(2)/α = log(2)/0.0004625733 = 1498.4 and compare it to your tree height: here 40.36665 (from the mean tip height). This half-life is much larger than the scale of your tree, like you said, so your interpretation of no phylogenetic signal is correct.

The other way is to compare the estimated α to the bounds imposed by the software when searching for the best-fitting value. If the estimated α is "stuck" at the lower bound, it means that it is as close to 0 as it can get, again meaning that the date contain no phylogenetic signal. Here, phyloglm has a default limit of ±4 for the (natural) log of αT, not for α alone (because the scale of α is the inverse of the scale of the tree height). We can check: log(αT) = log(0.0004625733 * 40.36665) = -3.98. That's indeed (super close to) the lower bound -4 imposed by phyloglm. So same conclusion: α wants to go to 0 to fit the data better, that is, no evidence of phylogenetic signal.