vincentarelbundock / marginaleffects

R package to compute and plot predictions, slopes, marginal means, and comparisons (contrasts, risk ratios, odds, etc.) for over 100 classes of statistical and ML models. Conduct linear and non-linear hypothesis tests, or equivalence tests. Calculate uncertainty estimates using the delta method, bootstrapping, or simulation-based inference
https://marginaleffects.com
Other
445 stars 46 forks source link

avg_comparison within function after fitting mfp model #845

Closed Matteo21Q closed 1 year ago

Matteo21Q commented 1 year ago

Hi, I am using marginaleffects together with mfp. mfp fits models (eg, glm) applying polynomial transformations to covariates. It returns an object that contains the standard glm fit object, on which I can get marginal risk diff easily as follows:

set.seed(20)
X<-rnorm(100)
Y<-rbinom(100,1,0.5+0.001*X)
dd<-data.frame(X,Y)
ff<-as.formula(Y~fp(X, df=4))

mfp.fit<-mfp(ff, data=dd, family="binomial")
glm.fit<-mfp.fit$fit
class(glm.fit)
glm.fit$formula

mef<-avg_comparisons(glm.fit, variables = list(X = c(1, 2)))

but I need to do this within a function, and the data set on which to fit the model is defined within the function. When I use the same code just included in a function, it gives an error, suggesting to try and pass newdata explicitly:

testfun<-function(dd) {

  dd2<-dd
  colnames(dd2)[1]<-"Z"
  ff2<-as.formula(Y~fp(Z, df=4))

  mfp.fit2<-mfp(ff2, data=dd2, family="binomial")
  glm.fit2<-mfp.fit2$fit

  mef<-avg_comparisons(glm.fit2, variables = list(Z = c(1, 2)))

}

testfun(dd)

But passing newdata explicitly doesn't seem to fix it:

testfun2<-function(dd) {

  dd2<-dd
  colnames(dd2)[1]<-"Z"
  ff2<-as.formula(Y~fp(Z, df=4))

  mfp.fit2<-mfp(ff2, data=dd2, family="binomial")
  glm.fit2<-mfp.fit2$fit

  mef<-avg_comparisons(glm.fit2, variables = list(Z = c(1, 2)), newdata=dd2)

}

testfun2(dd)

My sessioninfo is as follows:

> sessionInfo()
R version 4.3.0 (2023-04-21 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United Kingdom.1252    LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] mfp_1.5.2.2            survival_3.5-5         marginaleffects_0.13.0

loaded via a namespace (and not attached):
 [1] backports_1.4.1   Matrix_1.5-4      lattice_0.21-8    splines_4.3.0     generics_0.1.3    cli_3.6.1         grid_4.3.0        data.table_1.14.8 compiler_4.3.0   
[10] rstudioapi_0.14   tools_4.3.0       checkmate_2.1.0   Rcpp_1.0.10       rlang_1.1.0       insight_0.19.

Thanks in advance for any help!

vincentarelbundock commented 1 year ago

The problem is that the insight::get_data() function does not return a data frame with appropriate column names. I will close this because this is not a bug in marginaleffects and it should be fixed upstream. Unfortunately, I don’t have time to make a pull request to fix this in insight right now, but the code there is usually very easy to read, so you may want to give it a shot if you have time and interest.

library(mfp)
# Loading required package: survival

set.seed(20)
X <- rnorm(100)
Y <- rbinom(100, 1, 0.5 + 0.001 * X)
dd <- data.frame(X, Y)

testfun <- function(dd) {
    dd2 <- dd
    colnames(dd2)[1] <- "Z"
    ff2 <- as.formula(Y ~ fp(Z, df = 4))
    mfp.fit2 <- mfp(ff2, data = dd2, family = "binomial")
    glm.fit2 <- mfp.fit2$fit
    insight::get_data(glm.fit2)
}

testfun(dd) |> head()
# Warning: Could not recover model data from environment. Please make sure your
#   data is available in your workspace.
#   Trying to retrieve data from the model frame now.
#   Y             
# 1 0 4.062685....
# 2 1 2.314075....
# 3 1 4.685465....
# 4 0 1.567406....
# 5 1 2.453433....
# 6 1 3.469606....