jashu / beset

Best Subset Predictive Modeling
5 stars 0 forks source link

Can't calculate pseudo R2 for a ZINB done with pscl::zeroinfl #1

Closed diogoprov closed 5 years ago

diogoprov commented 5 years ago

I'm not sure what I'm doing wrong, but I can't calculate pseudo R2 for a zero-inflated model built with pscl::zeroinfl using beset::r2d(). The package documentation is not clear with this object returned by the function can be passed to r2d.

Appreciate if you could clarify that.

jashu commented 5 years ago

You can pass a "zeroinfl" object to r2d, yes. For example,

model <- pscl::zeroinfl(art ~ ., data = pscl::bioChemists, dist = "negbin") r2d(model)

should produce this output: Fit R-squared: 0.12

However, I have not yet implemented a cross-validation method for this method, so if you set parameter cv = TRUE you will receive an error. But otherwise the function should work. Perhaps you set model = FALSE when you ran zeroinfl? The function needs to be able to retrieve the model frame to work--I should add some defensive code against that.

If this does not resolve your issue, please include an example that I can use to reproduce the problem you're having.

diogoprov commented 5 years ago

Thanks for the quick reply. I'm using this code to run the analysis:

Data processing

Carabidae <- read.csv("Carabidae_MG.csv", sep=";")
Carab_red <- dplyr::select(Carabidae, CLYPPT_M_s,  SLTPPT_M_s , delta_03, std_03 , std_07 , std_10 , std_16 , delta_07 , delta_10 , delta_16, Desenvolvimento_linear_amostrado)
head(Carab_red)

carab_std <- decostand(Carab_red, "stand")
nova_tab <- cbind(carab_std, Carabidae$Riqueza_Carabidae)
head(nova_tab)

Modelling

ZIP <- zeroinfl(Carabidae$Riqueza_Carabidae ~ CLYPPT_M_s + SLTPPT_M_s + delta_03 + delta_07 + delta_10 + delta_16 + Desenvolvimento_linear_amostrado, data=nova_tab, dist = "poisson", link = "logit")

and this is the error I get when I try to calculate the R2

r2d(ZIP)
Error in stats::dpois(y, lambda = y_bar, log = TRUE) : 
  Non-numeric argument to mathematical function
In addition: Warning message:
In mean.default(y) : argument is not numeric or logical: returning NA

The same error is returned when I run a ZINB.

I can send you the dataset by email, so you'd be able to run the code yourself, since github doesn't allow attaching .csv file.

jashu commented 5 years ago

OK, the likely problem is your model formula is pulling the response variable and the predictor variables from 2 different data frames. It's saying get Riqueza_Carabidae from the Carabidae data frame, and all the predictors from the nova_tab data frame. My function assumes that all of the variables are in the frame passed to the data argument.

If you name the response variable when you cbind it to the predictor frame and don't reference the Carabidae data frame in your formula, it should work:

nova_tab <- cbind(carab_std, Riqueza_Carabidae = Carabidae$Riqueza_Carabidae)
ZIP <- zeroinfl(Riqueza_Carabidae ~ CLYPPT_M_s + SLTPPT_M_s + delta_03 + delta_07 + delta_10 + delta_16 + Desenvolvimento_linear_amostrado, data=nova_tab, dist = "poisson", link = "logit")
diogoprov commented 5 years ago

Yes! It indeed worked just fine. Thanks a lot!