pbreheny / visreg

Visualization of regression functions
http://pbreheny.github.io/visreg/
61 stars 18 forks source link

Error with mgcv fitted gam #98

Open nilescbn opened 2 years ago

nilescbn commented 2 years ago

First, thank you for this package and the documentation. Both have really benefited me.

I've typically had no issue using the package successfully. Yesterday I started getting the following error message when trying to run visreg() on an gam model run with the mgcv package:

Error in exists(tail(as.character(CALL$data), 1), call.env) : invalid first argument

This is perplexing because I was able to run visreg on mgcv objects, on the same fitted models, as recently as yesterday. I have explored variations on the models but I'm fairly certain visreg worked on the same versions I'm getting the error message for now.

I tried the install_github version of visreg yet get the same result.

I've spent ~30 min looking at Stack Overflow and other places for clues and so far haven't had luck.

I just downloaded the mgcViz package and it works with my models.

Thank you for the time. I don't know that this is a bug, I'm guessing not, but would appreciate any tips you might have.

I'm using the RStudio's latest version 1.4.1717. And here is my session info:

R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19041)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] visreg_2.7.0.2 mgcv_1.8-37    nlme_3.1-152  

loaded via a namespace (and not attached):
 [1] lattice_0.20-44 digest_0.6.28   grid_4.1.1      evaluate_0.14   rlang_0.4.11   
 [6] renv_0.14.0     Matrix_1.3-4    rmarkdown_2.11  splines_4.1.1   tools_4.1.1    
[11] xfun_0.26       yaml_2.2.1      fastmap_1.1.0   compiler_4.1.1  htmltools_0.5.2
[16] knitr_1.35 
pbreheny commented 2 years ago

Without a minimal reproducible example, I cannot possibly guess why your code is producing an error.

nilescbn commented 2 years ago

Okay, I understand but couldn't think of how to do one quickly and I thought there may be some issue with GAMs as it seems there has in the past. I will think harder about how to do a reproducible example.

In the mean time, just to show you that it's a valid model object that I'm having issues with, here's the output from summary().

Family: gaussian 
Link function: identity 

Formula:
log(lbs_dsrk) ~ s(X_km, Y_km) + s(set_year) + set_month_fct + 
    s(SET_DEPTH) + s(HAUL_DURATION)

Parametric coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
(Intercept)       3.139194   0.008832 355.444  < 2e-16 ***
set_month_fct.L   0.515514   0.032974  15.634  < 2e-16 ***
set_month_fct.Q   2.120625   0.040316  52.600  < 2e-16 ***
set_month_fct.C  -0.600362   0.031785 -18.889  < 2e-16 ***
set_month_fct^4  -0.162067   0.032327  -5.013 5.37e-07 ***
set_month_fct^5  -0.058738   0.031064  -1.891  0.05865 .  
set_month_fct^6  -0.399912   0.030937 -12.927  < 2e-16 ***
set_month_fct^7  -0.070763   0.030931  -2.288  0.02216 *  
set_month_fct^8   0.081701   0.030662   2.665  0.00771 ** 
set_month_fct^9  -0.037612   0.030387  -1.238  0.21580    
set_month_fct^10 -0.252248   0.030255  -8.337  < 2e-16 ***
set_month_fct^11 -0.063196   0.030749  -2.055  0.03986 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
                    edf Ref.df      F p-value    
s(X_km,Y_km)     28.518 28.982  85.76  <2e-16 ***
s(set_year)       8.727  8.976 148.06  <2e-16 ***
s(SET_DEPTH)      8.225  8.787 481.93  <2e-16 ***
s(HAUL_DURATION)  8.629  8.942  94.79  <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) =  0.212   Deviance explained = 21.3%
GCV = 3.2023  Scale est. = 3.1972    n = 41182

The model object is called fit_lbs. The error occurs when I try running visreg like so.

visreg(fit_lbs)

It's class "gam" "glm" "lm".

Again, I will try and reproduce the error with a simpler model.

Thank you.

pbreheny commented 2 years ago

Well, I doubt it has anything to do with the model itself, but I don't even know what the call to gam() looks like.

nilescbn commented 2 years ago

I see. It was this (and again, this is with the mgcv package):

fit_lbs <- gam(log(lbs_dsrk) ~ s(X_km, Y_km) + s(set_year) + set_month_fct + s(SET_DEPTH) + s(HAUL_DURATION),  data = hauls_analysis[pos_dsrk == 1 & rev_category_dsrk == "zero", ], 
                       family = gaussian)

It's happening with a separate model as well, run on the same data but it's a logistic/binomial model.

Again, while I can't be 100% it was this same version of the model, I know visreg worked perfectly on at least something very similar just yesterday.

pbreheny commented 2 years ago

I see; the issue is with

data = hauls_analysis[pos_dsrk == 1 & rev_category_dsrk == "zero", ]

I.e., applying an operation to the data during the call to gam(). This used to work fine, but R 4.0 changed some things and not every bug has been tracked down yet. Thank you very much for bringing this to my attention -- I'll fix it as soon as I can. In the meantime, you can avoid the error by subsetting the data outside the call to gam():

Sub <- subset(hauls_analysis, pos_dsrk == 1 & rev_category_dsrk == "zero")
fit <- gam(..., data=Sub)
nilescbn commented 2 years ago

Okay, that is something I did change yesterday afternoon (i.e. started using a subset of the data in the model). I would never have thought that on my own, so thank you for the quick replies. Very much appreciated.

And, yes, visreg() is indeed working now after following your recommendation.