grantmcdermott / etwfe

Extended two-way fixed effects
https://grantmcdermott.com/etwfe/
Other
50 stars 11 forks source link

Interaction with continuous variable #43

Open frederickluser opened 9 months ago

frederickluser commented 9 months ago

Hey Grant,

Hey @vincentarelbundock (maybe you can help me too because this is in the end a question on marginaleffects rather than etwfe)

I'm using the xvar argument so far to calculate effects for different groups of a categorical variable. This works great.

As an example, I can calculate a treatment effect for education = 1, 2, or 3 and I get three treatment effects for each of these groups from emfx. This would be something like this:

Term                 Contrast Estimate Std. Error      z       Pr(>|z|)  2.5 % 97.5 %  .Dtreat. educ
 .Dtreat mean(TRUE) - mean(FALSE)   1.1529   0.010874 106.02 < 2.22e-16 1.1315 1.1742    TRUE   1
 .Dtreat mean(TRUE) - mean(FALSE)   2.1092   0.010739 196.41 < 2.22e-16 2.0882 2.1303    TRUE   2
 .Dtreat mean(TRUE) - mean(FALSE)   3.1726   0.009533 332.82 < 2.22e-16 3.1540 3.1913    TRUE   3

However, for a continous variable, I guess most researchers want to report something different for a linear model: Simply an intercept and the interaction. Assuming I want to interact the treatment with distance:

etwfe(fml = y ~ 0, tvar = T,  gvar = G, xvar = distance, data = dat)

Then etwfe gives

GLM estimation, Dep. Var.: y
Observations: 1,000 
Fixed-effects: G: 19,  T: 3
Standard-errors: Clustered (G) 
                          Estimate Std. Error    t value   Pr(>|t|)    
.Dtreat:G::1:T::2:distance  0.014189   0.010905   1.301083 2.0964e-01    
.Dtreat:G::1:T::3:distance -0.178339   0.011416 -15.622212 6.5175e-12 ***
.Dtreat:G::2:T::2:distance -0.061419   0.010905  -5.631918 2.4157e-05 ***
.Dtreat:G::2:T::3:distance  0.419388   0.011416  36.737625  < 2.2e-16 ***
.Dtreat:G::3:T::3:distance -0.140821   0.011416 -12.335705 3.2324e-10 ***
...

which is good so far and emfx reports the marginal effects for different values of distance.

    Term                 Contrast .Dtreat ldist_air Estimate Std. Error        z Pr(>|z|)    S  2.5 % 97.5 %
 .Dtreat mean(TRUE) - mean(FALSE)    TRUE      8.88  -0.0372       6.59 -0.00564    0.996  0.0 -12.96  12.88
 .Dtreat mean(TRUE) - mean(FALSE)    TRUE      6.69 132.9161      24.82  5.35418   <0.001 23.5  84.26 181.57
 .Dtreat mean(TRUE) - mean(FALSE)    TRUE      8.24   1.7932      11.79  0.15216    0.879  0.2 -21.31  24.89
 .Dtreat mean(TRUE) - mean(FALSE)    TRUE      9.45  -7.0808       6.12 -1.15642    0.248  2.0 -19.08   4.92
 .Dtreat mean(TRUE) - mean(FALSE)    TRUE      9.06  -2.2928       6.43 -0.35667    0.721  0.5 -14.89  10.31
--- 79 rows omitted. See ?print.marginaleffects --- 
 .Dtreat mean(TRUE) - mean(FALSE)    TRUE      9.10  -2.8199       6.39 -0.44118    0.659  0.6 -15.35   9.71
 .Dtreat mean(TRUE) - mean(FALSE)    TRUE      8.25   8.4341       7.29  1.15744    0.247  2.0  -5.85  22.72
 .Dtreat mean(TRUE) - mean(FALSE)    TRUE      9.82 120.9891      16.57  7.30238   <0.001 41.7  88.52 153.46
 .Dtreat mean(TRUE) - mean(FALSE)    TRUE      8.34   6.9434       5.00  1.38910    0.165  2.6  -2.85  16.74
 .Dtreat mean(TRUE) - mean(FALSE)    TRUE      9.25  -9.4350       9.73 -0.97017    0.332  1.6 -28.50   9.63

But it's a linear model and I would like to see from emfx simply an Intercept and Slope Coefficient. Sadly, I am not an experts on the marginaleffects package. Do you know how I could achieve that? (And if you think this would be useful I could also try to implement and push it).

All the best, Frederic

vincentarelbundock commented 9 months ago

Maybe you can extract them from the fixest model? It's usually stored in attr(mfx_object, "model")

frederickluser commented 9 months ago

Dear Vincent,

Thanks for your answer. I think my question was not well formulated. I tried a couple of random fixest regressions, and they don't have a model attribute in my case? What I want to get is something like this:

Term           Contrast         Estimate Std. Error  z            Pr(>|z|)   S   2.5 % 97.5 %
 .Dtreat       TRUE - FALSE     5.62     3.81e+00    1.48e+00     0.14      2.8 -1.84  13.08
 distance      dY/dX            1.37     5.28e-06    2.60e+05   <0.001.     Inf  1.37   1.37

So, I want to "collapse" the etwfe coefficients to a marginal effect for .Dtreat and the xvar = distance. Any idea how I can do that? Then this would be the ETWFE-"equivalent" to a standard TWFE regression with one interacted continous regressor like

feols(y ~ .Dtreat*distance | id + period, dat)
vincentarelbundock commented 9 months ago

I meant the marginaleffects object has an attribute with the original fixest model in it.

But no, sorry, I would have to think about this more (and don't really have time right now.) Sorry!

grantmcdermott commented 9 months ago

Sorry, missed this when it first came out. @frederickluser do you have a fake dataset that I could play around with to test?

frederickluser commented 9 months ago

I played around with randomly generated data. But I think the mpdta should do a good job.

data("mpdta", package = "did")
mod <- etwfe::etwfe(
    fml  = lemp ~ 0, # outcome ~ controls
    tvar = year,        # time variable
    gvar = first.treat, # group variable
    xvar = lpop,
    data = mpdta,       # dataset
    vcov = ~countyreal  # vcov adjustment (here: clustered)
  )

etwfe::emfx(mod)
   Term                 Contrast .Dtreat   lpop Estimate Std. Error     z Pr(>|z|)    S 2.5 % 97.5 %
 .Dtreat mean(TRUE) - mean(FALSE)    TRUE 0.0658     1.38      0.226  6.11   <0.001 29.9 0.938   1.82
 .Dtreat mean(TRUE) - mean(FALSE)    TRUE 0.2461     1.39      0.220  6.32   <0.001 31.8 0.957   1.82
 .Dtreat mean(TRUE) - mean(FALSE)    TRUE 0.3075     1.39      0.217  6.39   <0.001 32.5 0.964   1.82
 .Dtreat mean(TRUE) - mean(FALSE)    TRUE 0.5014     1.40      0.211  6.63   <0.001 34.8 0.984   1.81
 .Dtreat mean(TRUE) - mean(FALSE)    TRUE 1.2983     1.43      0.183  7.78   <0.001 46.9 1.066   1.78
--- 181 rows omitted. See ?print.marginaleffects --- 
 .Dtreat mean(TRUE) - mean(FALSE)    TRUE 6.2479     1.60      0.134 11.98   <0.001 107.5 1.340   1.86
 .Dtreat mean(TRUE) - mean(FALSE)    TRUE 6.4683     2.70      0.400  6.75   <0.001  36.0 1.919   3.49
 .Dtreat mean(TRUE) - mean(FALSE)    TRUE 6.6865     1.62      0.144 11.23   <0.001  94.9 1.335   1.90
 .Dtreat mean(TRUE) - mean(FALSE)    TRUE 7.0310     1.57      0.197  8.00   <0.001  49.5 1.188   1.96
 .Dtreat mean(TRUE) - mean(FALSE)    TRUE 7.2399     1.64      0.159 10.29   <0.001  80.1 1.325   1.95
Columns: term, contrast, .Dtreat, lpop, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted 
Type:  response 

and I think most researchers would prefer something like this (including me for my paper):

  Term     Contrast Estimate Std. Error      z Pr(>|z|)     S 2.5 % 97.5 %
 .Dtreat TRUE - FALSE    1.506     0.1170  12.87   <0.001 123.4  1.28  1.735
 lpop_dm dY/dX           1.119     0.0212  52.79   <0.001   Inf  1.08  1.161
 year    2004 - 2003    -1.234     0.0925 -13.34   <0.001 132.4 -1.42 -1.052
 year    2005 - 2003    -1.245     0.0926 -13.45   <0.001 134.5 -1.43 -1.064
 year    2006 - 2003    -1.130     0.1003 -11.27   <0.001  95.4 -1.33 -0.934
 year    2007 - 2003    -0.907     0.1383  -6.56   <0.001  34.1 -1.18 -0.636

I got this output with

mfx = marginaleffects::avg_slopes(object, newdata = dat, wts = "N",...)

in the emfx function but I think this is not the object I am looking for which should be the ETWFE equivalent to the interaction term in this regression:

feols(lemp ~ treat*lpop | countyreal + year, data = mpdta %>% mutate(treat = ifelse(year >= first.treat, 1, 0)))

OLS estimation, Dep. Var.: lemp
Observations: 2,500 
Fixed-effects: countyreal: 500,  year: 5
Standard-errors: Clustered (countyreal) 
            Estimate Std. Error   t value Pr(>|t|) 
treat      -0.070708   0.045057 -1.569291  0.11721 
treat:lpop  0.009618   0.010878  0.884134  0.37705 
... 1 variable was removed because of collinearity (lpop)
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 0.124185     Adj. R2: 0.991506
                 Within R2: 0.004788

I hope that makes sense. Greetings, Frederic