lamho86 / phylolm

GNU General Public License v2.0
30 stars 12 forks source link

Error when using continuous predictor #21

Closed minicola2 closed 4 years ago

minicola2 commented 5 years ago

Hello,

I am trying to run a logistic phylgenetic regression with 'phyloglm' , which works fine for discrete predictors. However, as soon as I try to run it for a continuous predictor I get the following error: Error in three.point.compute(temp[1:2], (y - mu)/dia, mu (1 - mu) : NA/NaN/Inf in foreign function call (arg 11)

I tried removing al NA's (using na.omit), but this does not change anything. Thank you for any help.

Michaël

cecileane commented 5 years ago

Could you try to standardize your continuous predictor perhaps? Or first centralize it (subtract the mean)? Centralizing your predictor will make the intercept correspond to a prediction at the central value of your predictor, but won't change the other interpretations. But it might reduce numerical issues.

minicola2 commented 5 years ago

Dear Cecileane

I tried both standardizing and centralizing but get the same error. Any other idea that might work? Thanks

Michaël

minicola2 commented 5 years ago

Could you try to standardize your continuous predictor perhaps? Or first centralize it (subtract the mean)? Centralizing your predictor will make the intercept correspond to a prediction at the central value of your predictor, but won't change the other interpretations. But it might reduce numerical issues.

Dear Cecileane

Do you have any other ideas that might solve the problem? Kind regards

Michaël

cecileane commented 5 years ago

no. without any other information about the problem, no idea comes to mind.

minicola2 commented 4 years ago

Dear Cecileane

After trying some different datasets, it seems that the problem is limited to this dataset, and possibly its size. Would you be willing to have a look at the dataset to see if you are able to identify the problem? Kind regards

Michaël

cecileane commented 4 years ago

Sorry for the delay, things have been very busy. Yes I can try now!

hlken commented 4 years ago

Hello Cecileane and Michaël,

We are having a similar problem with one of our datasets, can you please let me know if you find a solution?

Kind regards, Haley

cecileane commented 4 years ago

Thanks so much Michaël for sharing. I reproduce below the same issue, but on a small and simulated data set. It turns out that this error occurs when the response variable is a factor! When the response variable is coded as a numeric variable (with just 0 and 1 values), then there is no error. Below is the same example as in the help page for phyloglm, but I added a few lines to show the error.

The example in the help page has this, and runs great:

set.seed(123456)
tre = rtree(50)
x = rTrait(n=1,phy=tre)
X = cbind(rep(1,50),x)
y = rbinTrait(n=1,phy=tre, beta=c(-1,0.5), alpha=1 ,X=X)
dat = data.frame(trait01 = y, predictor = x)
fit = phyloglm(trait01~predictor,phy=tre,data=dat,boot=100)
summary(fit)
coef(fit)
vcov(fit)

now I'll make the response a factor, and the error comes in:

dat$trait01.factor = factor(dat$trait01)
str(dat) # trait01 is numerical, trait01.factor is a factor with 2 levels
'data.frame':   50 obs. of  3 variables:
 $ trait01       : num  1 1 0 0 1 0 0 0 0 0 ...
 $ predictor     : num  -0.842 0.275 -0.36 -0.899 2.043 ...
 $ trait01.factor: Factor w/ 2 levels "0","1": 2 2 1 1 2 1 1 1 1 1 ...

then error when trying to run this:

fit.factor = phyloglm(trait01.factor~predictor, phy=tre, data=dat)
Warning in Ops.factor(y, mu): '-' not meaningful for factors
Error in three.point.compute(temp[1:2], (y - mu)/dia, mu * (1 - mu) * : NA/NaN/Inf in foreign function call (arg 11)

At least, the function checks that the response takes only 2 values, and that these values are 0 and 1 (see below). So when things work, they work as expected. But the function chokes when the response is a factor, which is really not good for users.

phyloglm(2*trait01 ~predictor, phy=tre, data=dat)   # error: wants values 0,1 not 0,2
phyloglm(trait01 + 1 ~predictor, phy=tre, data=dat) # same good error: requires values 0,1 not 1,2.
Error in phyloglm(trait01 + 1 ~ predictor, phy = tre, data = dat): The model by Ives and Garland requires a binary response (dependent variable).

@lamho86: would you have the time to fix this in the code?

lamho86 commented 4 years ago

The issue has been fixed. Thanks everyone. Please let me know if something else comes up.

On Oct 24, 2019, at 12:31 PM, Cécile Ané notifications@github.com wrote:

Thanks so much Michaël for sharing. I reproduce below the same issue, but on a small and simulated data set. It turns out that this error occurs when the response variable is a factor! When the response variable is coded as a numeric variable (with just 0 and 1 values), then there is no error. Below is the same example as in the help page for phyloglm, but I added a few lines to show the error.

The example in the help page has this, and runs great:

set.seed(123456) tre = rtree(50) x = rTrait(n=1,phy=tre) X = cbind(rep(1,50),x) y = rbinTrait(n=1,phy=tre, beta=c(-1,0.5), alpha=1 ,X=X) dat = data.frame(trait01 = y, predictor = x) fit = phyloglm(trait01~predictor,phy=tre,data=dat,boot=100) summary(fit) coef(fit) vcov(fit) now I'll make the response a factor, and the error comes in:

dat$trait01.factor = factor(dat$trait01) str(dat) # trait01 is numerical, trait01.factor is a factor with 2 levels 'data.frame': 50 obs. of 3 variables: $ trait01 : num 1 1 0 0 1 0 0 0 0 0 ... $ predictor : num -0.842 0.275 -0.36 -0.899 2.043 ... $ trait01.factor: Factor w/ 2 levels "0","1": 2 2 1 1 2 1 1 1 1 1 ... then error when trying to run this:

fit.factor = phyloglm(trait01.factor~predictor, phy=tre, data=dat) Warning in Ops.factor(y, mu): '-' not meaningful for factors Error in three.point.compute(temp[1:2], (y - mu)/dia, mu (1 - mu) : NA/NaN/Inf in foreign function call (arg 11) At least, the function checks that the response takes only 2 values, and that these values are 0 and 1 (see below). So when things work, they work as expected. But the function chokes when the response is a factor, which is really not good for users.

phyloglm(2*trait01 ~predictor, phy=tre, data=dat) # error: wants values 0,1 not 0,2 phyloglm(trait01 + 1 ~predictor, phy=tre, data=dat) # same good error: requires values 0,1 not 1,2. Error in phyloglm(trait01 + 1 ~ predictor, phy = tre, data = dat): The model by Ives and Garland requires a binary response (dependent variable). @lamho86 https://github.com/lamho86: would you have the time to fix this in the code?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lamho86/phylolm/issues/21?email_source=notifications&email_token=ABW3HPZJU3VXYMBAQ6N6BBLQQG5U5A5CNFSM4HOTB4IKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECFOKUQ#issuecomment-545973586, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABW3HP3WVO2J7IZ3HSS4TEDQQG5U5ANCNFSM4HOTB4IA.