vincentarelbundock / marginaleffects

R package to compute and plot predictions, slopes, marginal means, and comparisons (contrasts, risk ratios, odds, etc.) for over 100 classes of statistical and ML models. Conduct linear and non-linear hypothesis tests, or equivalence tests. Calculate uncertainty estimates using the delta method, bootstrapping, or simulation-based inference
https://marginaleffects.com
Other
449 stars 47 forks source link

Contrasts not working as expected anymore? #372

Closed rokapre closed 2 years ago

rokapre commented 2 years ago

Hello, awesome package! I'm the one who pointed out on twitter that marginaleffects was G comp btw.

In the latest update though, something seems really strange in the output. This is what I get in a simple simulation

library(marginaleffects)
library(dplyr)

set.seed(1)
n = 10
group = c(rep("A",n/2),rep("B",n/2))
x = runif(n,0,10)

xA = x[group=="A"]
xB = x[group=="B"]

yA = 2 + 2*xA + rnorm(n/2,0,0.1)
yB = -1 + 3*xB + rnorm(n/2,0,0.1)

simdat = tibble(group=group,x=c(xA,xB),y=c(yA,yB))
simdat$group = as.factor(simdat$group)

model_additive = lm(y~x+group,data=simdat)
model_interaction = lm(y~x*group,data=simdat)

summary(model_additive)
summary(model_interaction)

#g comp on whole data 
simdat_doA = simdat |> mutate(group="A")
simdat_doB = simdat |> mutate(group="B")

mean(predict(model_additive,newdata=simdat_doB)) - mean(predict(model_additive,newdata=simdat_doA))

mean(predict(model_interaction,newdata=simdat_doB)) - mean(predict(model_interaction,newdata=simdat_doA))

marginaleffects(model_additive) |> tidy() #why 2 things?
marginaleffects(model_interaction) |> tidy() # again 2 things, different this time

comparisons(model_additive,var="group") |> tidy() #same issues
comparisons(model_interaction,var="group") |> tidy()

#seeing if the counterfactual contrast g comp is being computed within the groups itself
# ie similar to ATT/ATU (effect of treatment on treated, and untreated) 
#this seems like what is happening but not what was asked for

simdat_A_doA = simdat |> filter(group=="A")
simdat_A_doB = simdat |> filter(group=="A") |> mutate(group="B")

simdat_B_doA = simdat |> filter(group=="B") |> mutate(group="A")
simdat_B_doB = simdat |> filter(group=="A") 

mean(predict(model_interaction,newdata=simdat_A_doB)) - mean(predict(model_interaction,newdata=simdat_A_doA))
mean(predict(model_interaction,newdata=simdat_B_doB)) - mean(predict(model_interaction,newdata=simdat_B_doA)) #but opp sign of what was reported

#confirms this is what is being computed

I am seeing 2 outputs for B-A, one for group B and one for group A. It doesn't make much sense. After further investigation, it looks like now its suddenly performing the G comp within the group itself and thus ending up with 2 different effects which is not what was queried initially. In addition, for the 2nd group the effect is in the opposite direction of what it should be.

It seems like something changed recently. Is there a way to go back to the behavior of how it was a few weeks ago? I realize there are lots of new features now, though it seems like somewhere something broke adding new stuff.

Edit:

Now I am able to get what I want with inserting counterfactual() but I didn't have to do this before, and also there is the 2 rows that are the same problem which is a bit strange. Its also not the default behavior like it was before and this counterfactual() is explicitly required

comparisons(model_interaction,var="group",newdata=counterfactual(group=c("A","B"))) |> tidy()
vincentarelbundock commented 2 years ago

Thanks again for that! It’s such a cool link.

The grid_type="counterfactual" will duplicate the whole dataset before computing any contrast, so that works. But you should also be able to not use datagrid() at all, because under-the-hood we duplicate the data anyway whenever we compute a contrast.

The problem here seems to be that you are using group as a variable name, but I am using the same variable name internally for tidy() to group-summarize over. It’s obviously a terrible idea to "reserve" such a common word in the package, so I’ll try to change it behind the scenes, or at least build some guardrails.

Here’s your example with a different variable name:

library(marginaleffects)

set.seed(1)
n <- 10
group_id <- c(rep("A", n / 2), rep("B", n / 2))
x <- runif(n, 0, 10)

xA <- x[group_id == "A"]
xB <- x[group_id == "B"]

yA <- 2 + 2 * xA + rnorm(n / 2, 0, 0.1)
yB <- -1 + 3 * xB + rnorm(n / 2, 0, 0.1)

simdat <- data.frame(group_id = group_id, x = c(xA, xB), y = c(yA, yB))
simdat$group_id <- as.factor(simdat$group_id)

m1 <- lm(y ~ x + group_id, data = simdat)
m2 <- lm(y ~ x * group_id, data = simdat)

# g comp on whole data
simdat_doA <- simdat_doB <- simdat
simdat_doA$group_id <- "A"
simdat_doB$group_id <- "B"

g1 <- mean(predict(m1, newdata = simdat_doB)) -
      mean(predict(m1, newdata = simdat_doA))
g2 <- mean(predict(m2, newdata = simdat_doB)) -
      mean(predict(m2, newdata = simdat_doA))

c1 <- comparisons(m1, variable = "group_id")
c1 <- tidy(c1)

c2 <- comparisons(m2, variable = "group_id")
c2 <- tidy(c2)

all.equal(g1, c1$estimate)
#> [1] TRUE
all.equal(g2, c2$estimate)
#> [1] TRUE
rokapre commented 2 years ago

Oh wow, yea bad coincidence then. No wonder it was just happening for this but on my "real" data I wasn't having a problem. Specifically I happened to be just simulating some simpler data to test out some stuff. Since I am here already may as well ask-- how do I get a contrast of a joint-intervention (double average treatment effect -DATE) or a contrast of a conditional average treatment effect (CATE)?

These are different in general (so my guess is different syntax), although on my data happen to be the same since I have a 2-factor experiment. The experiment in my case is nonlinear y = f(x,treatment) and x is continuous while treatment is binary. So I am using GAMs on my real data.

I was just trying to get to replicating it in the linear with interactions case above (but then I ran into this coincidental issue since I called the column 'group' in my simulation).

What would be the syntax for comparing the effect of x itself across treatment groups? Essentially comparing the average slopes between treatments? (This problem can be formulated as a contrast of joint interventions on both x and treatment or as a contrast of x across treatments as observed)

vincentarelbundock commented 2 years ago

cool cool

In the development version on Github I added a hypothesis argument which allows linear and non-linear hypothesis tests. It also accepts vectors and matrices of weights, which allow you to create all sorts of contrasts, across interventions or response level. See this vignette (especially the bottom half): https://vincentarelbundock.github.io/marginaleffects/articles/hypothesis.html#custom-contrasts-with-vectors-and-matrices

Let me know if this doesn't work for you, or if you think of an easier user interface.

rokapre commented 2 years ago

Thanks, I'm trying some stuff out right now and what I want is basically the effect of x contrasted between groups B and A.

I did this:

marginaleffects(model_interaction,var="x",
                newdata = datagrid(group_id=c("A","B")),hypothesis="pairwise")

And in this particular simulated case it just coincidentally happens to be the right answer, but that's because the model here is linear. So this won't generalize. datagrid() is setting the x at the mean x value and then doing the contrast (which is only equivalent when the model is linear with no polynomial/spline terms)

What I want is using all the observed x values in the dataset and then doing the average contrast, but the following with grid_type="counterfactual" does not give that (instead it seems to just average the results for A and B, but I want the difference B-A:

marginaleffects(model_interaction,var="x",newdata = datagrid(group_id=c("A","B"),grid_type="counterfactual")) |> tidy()

And using hypothesis="pairwise" here seems to compare every row of the created dataset and not the summary:

marginaleffects(model_interaction,var="x",
                newdata = datagrid(group_id=c("A","B"),grid_type="counterfactual"),hypothesis="pairwise")

The closest I could get was:

marginaleffects(model_interaction,var="x",
                newdata = datagrid(group_id=c("A","B"),grid_type="counterfactual")) |> tidy(by="group_id")

But that gives it for the separate groups, when I want the contrast between those. I don't think there is a hypothesis="" argument in tidy().

vincentarelbundock commented 2 years ago

Got it.

I have some ideas but will need to work on the backend a bit. Will let you know when I have something to show. Hopefully it won't take too long.

rokapre commented 2 years ago

Thanks! Reading more on the linear contrasts, I found a (somewhat too hacky way for my taste) by doing:

rep(+-1/10,10) since n=10 for the data.

marginaleffects(model_interaction,var="x",
                newdata = datagrid(group_id=c("A","B"),grid_type="counterfactual"),hypothesis=c(rep(-1/10,10),rep(1/10,10)))

That syntax seems to basically take the individual differences in the dydx as if every individual got A and B then average them to get the B-A difference contrast in the effect of x.

In this case, since its a simple linear model with interaction, the output should match the interaction term of summary(model_interaction) though and the SE matches up well although strangely the p-value of e-252 doesn't and is extremely low (Edit: actually I see why the p-value is much lower lol it was a simple case of using the z test while lm uses t test)

So I guess the functionality for this is there somewhere, but not easy to do without knowing too many of the details like exactly what you are doing.

vincentarelbundock commented 2 years ago

How about this?

Using the very latest version from Github, you can use the transform_pre argument to emulate the effect of calling both comparisons() and tidy(). See the vignette: https://vincentarelbundock.github.io/marginaleffects/articles/transformation.html

library(marginaleffects)

mod <- lm(mpg ~ factor(cyl) + factor(gear), mtcars)

comparisons(mod) |>
    tidy() |>
    subset(select = 1:5)
#>       type term contrast   estimate std.error
#> 1 response  cyl    6 - 4  -6.655978  1.629136
#> 2 response  cyl    8 - 4 -10.542210  1.957977
#> 3 response gear    4 - 3   1.324094  1.927619
#> 4 response gear    5 - 3   1.500181  1.854852

comparisons(
    mod,
    transform_pre = function(hi, lo) mean(hi) - mean(lo)) |>
    subset(select = 1:5)
#>       type term contrast comparison std.error
#> 1 response  cyl     6, 4  -6.655978  1.629136
#> 2 response  cyl     8, 4 -10.542210  1.957977
#> 3 response gear     4, 3   1.324094  1.927619
#> 4 response gear     5, 3   1.500181  1.854852

Or use the string shortcut for convenience:

comparisons(
    mod,
    transform_pre = "differenceavg") |>
    subset(select = 1:5)
#>       type term          contrast comparison std.error
#> 1 response  cyl mean(6) - mean(4)  -6.655978  1.629136
#> 2 response  cyl mean(8) - mean(4) -10.542210  1.957977
#> 3 response gear mean(4) - mean(3)   1.324094  1.927619
#> 4 response gear mean(5) - mean(3)   1.500181  1.854852

Finally, we combine this with the hypothesis argument, as documented:

comparisons(
    mod,
    transform_pre = "differenceavg",
    hypothesis = "r1 = r3",
    newdata = mtcars) |>
    subset(select = 1:5)
#>       type       term hypothesis comparison std.error
#> 1 response hypothesis      r1=r3  -7.980072  2.252165

The newdata in that last call is not actually necessary, but you’ll get a useless warning if you don’t include it.

vincentarelbundock commented 2 years ago

Obviously, this is not your exact solution. I'm just trying to illustrate the mechanisms. I think it's quite flexible...

vincentarelbundock commented 2 years ago

"Reserved" variable names now trigger a warning with f5611772d