Interaction with continuous variable

frederickluser commented 11 months ago

Hey Grant,

Hey @vincentarelbundock (maybe you can help me too because this is in the end a question on marginaleffects rather than etwfe)

I'm using the xvar argument so far to calculate effects for different groups of a categorical variable. This works great.

As an example, I can calculate a treatment effect for education = 1, 2, or 3 and I get three treatment effects for each of these groups from emfx. This would be something like this:

Term                 Contrast Estimate Std. Error      z       Pr(>|z|)  2.5 % 97.5 %  .Dtreat. educ
 .Dtreat mean(TRUE) - mean(FALSE)   1.1529   0.010874 106.02 < 2.22e-16 1.1315 1.1742    TRUE   1
 .Dtreat mean(TRUE) - mean(FALSE)   2.1092   0.010739 196.41 < 2.22e-16 2.0882 2.1303    TRUE   2
 .Dtreat mean(TRUE) - mean(FALSE)   3.1726   0.009533 332.82 < 2.22e-16 3.1540 3.1913    TRUE   3

However, for a continous variable, I guess most researchers want to report something different for a linear model: Simply an intercept and the interaction. Assuming I want to interact the treatment with distance:

etwfe(fml = y ~ 0, tvar = T,  gvar = G, xvar = distance, data = dat)

Then etwfe gives

GLM estimation, Dep. Var.: y
Observations: 1,000 
Fixed-effects: G: 19,  T: 3
Standard-errors: Clustered (G) 
                          Estimate Std. Error    t value   Pr(>|t|)    
.Dtreat:G::1:T::2:distance  0.014189   0.010905   1.301083 2.0964e-01    
.Dtreat:G::1:T::3:distance -0.178339   0.011416 -15.622212 6.5175e-12 ***
.Dtreat:G::2:T::2:distance -0.061419   0.010905  -5.631918 2.4157e-05 ***
.Dtreat:G::2:T::3:distance  0.419388   0.011416  36.737625  < 2.2e-16 ***
.Dtreat:G::3:T::3:distance -0.140821   0.011416 -12.335705 3.2324e-10 ***
...

which is good so far and emfx reports the marginal effects for different values of distance.

    Term                 Contrast .Dtreat ldist_air Estimate Std. Error        z Pr(>|z|)    S  2.5 % 97.5 %
 .Dtreat mean(TRUE) - mean(FALSE)    TRUE      8.88  -0.0372       6.59 -0.00564    0.996  0.0 -12.96  12.88
 .Dtreat mean(TRUE) - mean(FALSE)    TRUE      6.69 132.9161      24.82  5.35418   <0.001 23.5  84.26 181.57
 .Dtreat mean(TRUE) - mean(FALSE)    TRUE      8.24   1.7932      11.79  0.15216    0.879  0.2 -21.31  24.89
 .Dtreat mean(TRUE) - mean(FALSE)    TRUE      9.45  -7.0808       6.12 -1.15642    0.248  2.0 -19.08   4.92
 .Dtreat mean(TRUE) - mean(FALSE)    TRUE      9.06  -2.2928       6.43 -0.35667    0.721  0.5 -14.89  10.31
--- 79 rows omitted. See ?print.marginaleffects --- 
 .Dtreat mean(TRUE) - mean(FALSE)    TRUE      9.10  -2.8199       6.39 -0.44118    0.659  0.6 -15.35   9.71
 .Dtreat mean(TRUE) - mean(FALSE)    TRUE      8.25   8.4341       7.29  1.15744    0.247  2.0  -5.85  22.72
 .Dtreat mean(TRUE) - mean(FALSE)    TRUE      9.82 120.9891      16.57  7.30238   <0.001 41.7  88.52 153.46
 .Dtreat mean(TRUE) - mean(FALSE)    TRUE      8.34   6.9434       5.00  1.38910    0.165  2.6  -2.85  16.74
 .Dtreat mean(TRUE) - mean(FALSE)    TRUE      9.25  -9.4350       9.73 -0.97017    0.332  1.6 -28.50   9.63

But it's a linear model and I would like to see from emfx simply an Intercept and Slope Coefficient. Sadly, I am not an experts on the marginaleffects package. Do you know how I could achieve that? (And if you think this would be useful I could also try to implement and push it).

All the best, Frederic

vincentarelbundock commented 11 months ago

Maybe you can extract them from the fixest model? It's usually stored in attr(mfx_object, "model")

frederickluser commented 11 months ago

Dear Vincent,

Thanks for your answer. I think my question was not well formulated. I tried a couple of random fixest regressions, and they don't have a model attribute in my case? What I want to get is something like this:

Term           Contrast         Estimate Std. Error  z            Pr(>|z|)   S   2.5 % 97.5 %
 .Dtreat       TRUE - FALSE     5.62     3.81e+00    1.48e+00     0.14      2.8 -1.84  13.08
 distance      dY/dX            1.37     5.28e-06    2.60e+05   <0.001.     Inf  1.37   1.37

So, I want to "collapse" the etwfe coefficients to a marginal effect for .Dtreat and the xvar = distance. Any idea how I can do that? Then this would be the ETWFE-"equivalent" to a standard TWFE regression with one interacted continous regressor like

feols(y ~ .Dtreat*distance | id + period, dat)

vincentarelbundock commented 11 months ago

I meant the marginaleffects object has an attribute with the original fixest model in it.

But no, sorry, I would have to think about this more (and don't really have time right now.) Sorry!

grantmcdermott commented 11 months ago

Sorry, missed this when it first came out. @frederickluser do you have a fake dataset that I could play around with to test?

frederickluser commented 11 months ago

I played around with randomly generated data. But I think the mpdta should do a good job.

data("mpdta", package = "did")
mod <- etwfe::etwfe(
    fml  = lemp ~ 0, # outcome ~ controls
    tvar = year,        # time variable
    gvar = first.treat, # group variable
    xvar = lpop,
    data = mpdta,       # dataset
    vcov = ~countyreal  # vcov adjustment (here: clustered)
  )

etwfe::emfx(mod)
   Term                 Contrast .Dtreat   lpop Estimate Std. Error     z Pr(>|z|)    S 2.5 % 97.5 %
 .Dtreat mean(TRUE) - mean(FALSE)    TRUE 0.0658     1.38      0.226  6.11   <0.001 29.9 0.938   1.82
 .Dtreat mean(TRUE) - mean(FALSE)    TRUE 0.2461     1.39      0.220  6.32   <0.001 31.8 0.957   1.82
 .Dtreat mean(TRUE) - mean(FALSE)    TRUE 0.3075     1.39      0.217  6.39   <0.001 32.5 0.964   1.82
 .Dtreat mean(TRUE) - mean(FALSE)    TRUE 0.5014     1.40      0.211  6.63   <0.001 34.8 0.984   1.81
 .Dtreat mean(TRUE) - mean(FALSE)    TRUE 1.2983     1.43      0.183  7.78   <0.001 46.9 1.066   1.78
--- 181 rows omitted. See ?print.marginaleffects --- 
 .Dtreat mean(TRUE) - mean(FALSE)    TRUE 6.2479     1.60      0.134 11.98   <0.001 107.5 1.340   1.86
 .Dtreat mean(TRUE) - mean(FALSE)    TRUE 6.4683     2.70      0.400  6.75   <0.001  36.0 1.919   3.49
 .Dtreat mean(TRUE) - mean(FALSE)    TRUE 6.6865     1.62      0.144 11.23   <0.001  94.9 1.335   1.90
 .Dtreat mean(TRUE) - mean(FALSE)    TRUE 7.0310     1.57      0.197  8.00   <0.001  49.5 1.188   1.96
 .Dtreat mean(TRUE) - mean(FALSE)    TRUE 7.2399     1.64      0.159 10.29   <0.001  80.1 1.325   1.95
Columns: term, contrast, .Dtreat, lpop, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted 
Type:  response

and I think most researchers would prefer something like this (including me for my paper):

  Term     Contrast Estimate Std. Error      z Pr(>|z|)     S 2.5 % 97.5 %
 .Dtreat TRUE - FALSE    1.506     0.1170  12.87   <0.001 123.4  1.28  1.735
 lpop_dm dY/dX           1.119     0.0212  52.79   <0.001   Inf  1.08  1.161
 year    2004 - 2003    -1.234     0.0925 -13.34   <0.001 132.4 -1.42 -1.052
 year    2005 - 2003    -1.245     0.0926 -13.45   <0.001 134.5 -1.43 -1.064
 year    2006 - 2003    -1.130     0.1003 -11.27   <0.001  95.4 -1.33 -0.934
 year    2007 - 2003    -0.907     0.1383  -6.56   <0.001  34.1 -1.18 -0.636

I got this output with

mfx = marginaleffects::avg_slopes(object, newdata = dat, wts = "N",...)

in the emfx function but I think this is not the object I am looking for which should be the ETWFE equivalent to the interaction term in this regression:

feols(lemp ~ treat*lpop | countyreal + year, data = mpdta %>% mutate(treat = ifelse(year >= first.treat, 1, 0)))

OLS estimation, Dep. Var.: lemp
Observations: 2,500 
Fixed-effects: countyreal: 500,  year: 5
Standard-errors: Clustered (countyreal) 
            Estimate Std. Error   t value Pr(>|t|) 
treat      -0.070708   0.045057 -1.569291  0.11721 
treat:lpop  0.009618   0.010878  0.884134  0.37705 
... 1 variable was removed because of collinearity (lpop)
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 0.124185     Adj. R2: 0.991506
                 Within R2: 0.004788

I hope that makes sense. Greetings, Frederic

grantmcdermott / etwfe

Interaction with continuous variable #43