lamho86 / phylolm

GNU General Public License v2.0
30 stars 12 forks source link

Phyloglm produces different results with normalized and raw predictor variables #68

Open NB-Bio opened 5 months ago

NB-Bio commented 5 months ago

Hi, I have one variable (raw) and its rescaled version from 0 to 10 (normalized), transformed through min-max normalization formula. normalized= (raw i- min(raw)) / (max(raw) - min(raw)).

Since rawand normalizedare a linear re-parameterizations of each other, I expected the same AIC from fitting a binary phylogenetic logistic regression through phyloglm with each of them as predictor and presence as a response:

raw <- phyloglm(presence ~ raw, data = git, phy = git.tree, method = "logistic_MPLE", btol = 30) normalized <- phyloglm(presence ~ normalized, data = git, phy = git.tree, method = "logistic_MPLE", btol = 30)

However, the two models have a very different AIC raw$aic: 1168.742 normalized$aic: 1112.437

As a comparison, if a run a non-phylogenetic regression with glm I get the same AIC (1158).

glm(presence~raw, data = git, family = "binomial")
glm(presence~normalized, data = git, family = "binomial")

This difference in the AIC of the two models (raw and normalized) is problematic when I try to compare rawand normalized with some other predictor (let's call it other) since I get the weird situation in which, for example, rawis a better predictor than other, but normalizedis worse (when, as far I can understand, they should have the same perfomance).

I don't know what I am missing, but if someone want to the explore data they are available on my drive at this link https://drive.google.com/file/d/192lPTVECtIZZHkc7hhwyvXYDBwL5kVvP/view?usp=drive_link

Thank you so much,