leeper / margins

An R Port of Stata's 'margins' Command
https://cloud.r-project.org/package=margins
Other
260 stars 39 forks source link

Possible bug: inferring the model data via the call expression #183

Open Chris-Larkin opened 2 years ago

Chris-Larkin commented 2 years ago

First, thank you for writing and sharing margins. As an ex-Stata user this fills a big gap in my analytical workflows post transitioning to R.

I'm trying to use the margins package to get marginal effects of a simple linear model, but it returns the error:

Error in eval(model[["call"]][["data"]], env) : object '.' not found

This data can be used to reproduce the problem:

forty_rows <- structure(list(wk_dist_eff_nov16 = structure(c(18, -24, -35, 
-30, 18, 18, 4, -56, -41, 31, 18, -20, 36, 18, -15, 18, 35, 18, 
18, -58, -52, -21, -47, 19, 18, 23, -38, 4, -50, -63, 31, -2, 
-27, 2, 18, 18, -8, -12, 14, 19), class = "difftime", units = "days"), 
    election_2016_11 = c(NA, NA, "0", NA, "0", "0", "1", NA, 
    NA, "0", "0", NA, "1", "0", "0", "0", "1", "0", "0", NA, 
    NA, "0", "0", "0", "1", "1", "1", "1", NA, NA, "0", NA, "0", 
    NA, "1", "0", NA, NA, "0", "1")), class = "data.frame", row.names = c(NA, 
-40L))

library(tidyverse)

model <- forty_rows %>% 
             filter(!is.na(election_2016_11), 
                    wk_dist_eff_nov16 %in% -36:0) %>% 
    lm(as.numeric(election_2016_11) ~ as.factor(wk_dist_eff_nov16), data = .)

Which returns non-sensical values but that's just an artefact of how i've created a small reproducible example (sampling only 40 rows from a df of > 500k). I can then call tidy() on the model object no problem, but margins() returns an error:

tidy(model)

# A tibble: 4 x 5
  term                            estimate std.error statistic p.value
  <chr>                              <dbl>     <dbl>     <dbl>   <dbl>
1 (Intercept)                            0       NaN       NaN     NaN
2 as.factor(wk_dist_eff_nov16)-27        0       NaN       NaN     NaN
3 as.factor(wk_dist_eff_nov16)-21        0       NaN       NaN     NaN
4 as.factor(wk_dist_eff_nov16)-15        0       NaN       NaN     NaN

library(margins)
margins(model)

Error in eval(model[["call"]][["data"]], env) : object '.' not found

Based on this SO discussion, the error seems to occur because margins tries to infer the model data via the call expression. A suggested solution (see the SO post) to get around this is to specify the data explicitly in the margins() call. E.g.:

margins(model, data=model$model)

But this returns the error:

Error in attributes(.Data) <- c(attributes(.Data), attrib) :    'names' attribute [1] must be the same length as the vector [0]

Additionally, when I attempt to use this workaround with my full dataset (which is too large to post here but i'm happy to share as an attachment if that would be useful?), i get the error:

Error in seq_len(nrow(data)) : argument must be coercible to non-negative integer

Which someone on SO identified as coming from margins:::dydx.factor lines 15 and 19.