kosukeimai / mediation

R package mediation
58 stars 29 forks source link

Problem with models that use flexible formula #20

Closed mdtrinh closed 4 years ago

mdtrinh commented 4 years ago

The formula object in R allows variables to be specified flexibly e.g. y ~ factor(X) or y ~ log(X). Doing this when fitting models for mediate, however, results in error.

This example from the package documentation works as expected:

library("mediation")
data("jobs")
set.seed(4646)

model.m <- lm(job_seek ~ treat + depress1 + econ_hard + sex + age
              + occp + marital + nonwhite + educ + income, data = jobs)
model.y <- lm(depress2 ~ treat + job_seek + depress1 + econ_hard + sex + age
              + occp + marital + nonwhite + educ + income, data = jobs)

out.1 <- mediate(model.m, model.y, sims = 10, boot = TRUE, treat = "treat",
                 mediator = "job_seek")
out.2 <- mediate(model.m, model.y, sims = 10, treat = "treat",
                 mediator = "job_seek")

But would break if we replace sexwith factor(sex):

> model.m <- lm(job_seek ~ treat + depress1 + econ_hard + factor(sex) + age
+               + occp + marital + nonwhite + educ + income, data = jobs)
> model.y <- lm(depress2 ~ treat + job_seek + depress1 + econ_hard + sex + age
+               + occp + marital + nonwhite + educ + income, data = jobs)
> 
> out.1 <- mediate(model.m, model.y, sims = 10, boot = TRUE, treat = "treat",
+                  mediator = "job_seek")
Running nonparametric bootstrap

Error in factor(sex) : object 'sex' not found

Or, if we try replacing age with log(age):

> model.m <- lm(job_seek ~ treat + depress1 + econ_hard + sex + log(age)
+               + occp + marital + nonwhite + educ + income, data = jobs)
> model.y <- lm(depress2 ~ treat + job_seek + depress1 + econ_hard + sex + age
+               + occp + marital + nonwhite + educ + income, data = jobs)
> 
> out.1 <- mediate(model.m, model.y, sims = 10, boot = TRUE, treat = "treat",
+                  mediator = "job_seek")
Running nonparametric bootstrap

Error in eval(predvars, data, env) : object 'age' not found

The root cause has to do with how model.frame() handles column names. For example:

> model.m <- lm(job_seek ~ treat + depress1 + econ_hard + factor(sex) + age
+               + occp + marital + nonwhite + educ + income, data = jobs)
> test <- predict(model.frame())
Error in terms.formula(formula, data = data) : 
  argument is not a valid model
> test <- predict(model.m)
> test <- predict(model.m, newdata = model.frame(model.m))
Error in factor(sex) : object 'sex' not found