vincentarelbundock / marginaleffects

R package to compute and plot predictions, slopes, marginal means, and comparisons (contrasts, risk ratios, odds, etc.) for over 100 classes of statistical and ML models. Conduct linear and non-linear hypothesis tests, or equivalence tests. Calculate uncertainty estimates using the delta method, bootstrapping, or simulation-based inference
https://marginaleffects.com
Other
461 stars 48 forks source link

slopes() overwriting variable column with repeated first value ({tidymodels}) #1209

Closed agmath closed 4 weeks ago

agmath commented 2 months ago

I am enjoying this package. Thank you for developing and maintaining it!

When using the slopes() function with a model created using the {tidymodels} framework, it seems that the variable column of the resulting data frame contains only the first observed value of that variable -- the column is constant. I would expect that column to contain the observed values of the variable as in the data frame passed as newdata. This is in fact the behavior when using base-R's lm() directly.

Below is a reproducible example. I hope it is appropriate. The column in question is the x column of the mfx data frame. I'm currently "fixing" this by mutating the original column of the my_data data frame back onto mfx.

Thank you for any help or guidance you can provide!

library(tidyverse)
library(tidymodels)
library(marginaleffects)
library(reprex)

#Generate 50 (x,y)-pairs
nobs <- 50
my_data <- tibble(
  x = runif(nobs, 0, 10),
  y = -(x - 11)^2 + 100 + rnorm(nobs, 0, 25)
)

#Build and fit tidymodels model
lr_spec <- linear_reg()
lr_rec <- recipe(y ~ x, data = my_data) %>%
  step_poly(x, degree = 2)
lr_wf <- workflow() %>%
  add_model(lr_spec) %>%
  add_recipe(lr_rec)

lr_fit <- lr_wf %>%
  fit(my_data)

#Obtain marginal effects of x on y
mfx <- lr_fit %>%
  slopes(newdata = my_data, variable = "x") %>%
  tibble()

mfx %>%
  select(term, estimate, conf.low, conf.high, x)
#> # A tibble: 50 × 5
#>    term  estimate conf.low conf.high     x
#>    <chr>    <dbl>    <dbl>     <dbl> <dbl>
#>  1 x         9.91     6.97      12.8  6.20
#>  2 x         5.98    -2.14      14.1  6.20
#>  3 x        16.1      8.45      23.7  6.20
#>  4 x        11.4      9.17      13.7  6.20
#>  5 x        15.6      8.64      22.5  6.20
#>  6 x         5.86    -2.43      14.1  6.20
#>  7 x        10.2      7.50      12.9  6.20
#>  8 x        11.8      9.41      14.2  6.20
#>  9 x        10.4      7.80      12.9  6.20
#> 10 x         7.68     1.98      13.4  6.20
#> # ℹ 40 more rows

Created on 2024-08-31 with reprex v2.1.0

vincentarelbundock commented 2 months ago

Thanks for the report and reproducible example!

I'll do a deep dive soon, but at first glance, I'm 94% sure this is an innocuous indexing error. In all likelihood, slopes are evaluated at the actually observed x values, and the only thing wrong is values in the x column of the output, not the estimate.

I'll confirm as soon as possible with a fix.

vincentarelbundock commented 2 months ago

FWIW, the problem is probably in methods_tidymodels.R and get_predict.model_fit().

agmath commented 2 months ago

I agree with this -- the estimated slopes seem to be evaluated at the actually observed x values. I'm happy to look through the get_predict.model_fit() method in the methods_tidymodels.R script. If I find anything before you do, I'll submit a pull request.

Thanks for pointing me in the right direction!

vincentarelbundock commented 2 months ago

Sounds fantastic (no pressure)! I really appreciate you taking the time to craft a reproducible example and look at the code.

vincentarelbundock commented 4 weeks ago

Thanks again for the report. Should be fixed in Github main now.

agmath commented 4 weeks ago

Thank you! I really appreciate it.

On Sat, Oct 5, 2024 at 8:26 AM Vincent Arel-Bundock < @.***> wrote:

Thanks again for the report. Should be fixed in Github main now.

— Reply to this email directly, view it on GitHub https://github.com/vincentarelbundock/marginaleffects/issues/1209#issuecomment-2395040589, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZ3ZP2FPBLAI6BN5AYQHIDZZ7LF3AVCNFSM6AAAAABNOJ6XEGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOJVGA2DANJYHE . You are receiving this because you authored the thread.Message ID: @.***>

-- Adam Gilbert, PhD Associate Professor of Mathematics Southern New Hampshire University