Query RE Heterogeneous treatment effects

grantmcdermott / etwfe

Extended two-way fixed effects

https://grantmcdermott.com/etwfe/

Other

51 stars 11 forks source link

Query RE Heterogeneous treatment effects #29

Closed PhilipCarthy closed 1 year ago

PhilipCarthy commented 1 year ago

Dear Grant,

Thank you very much for creating a great package. I have a query regarding the behaviour of the xvar argument for heterogeneous treatment effects:

In the case of a categorical xvar with more than two levels, I noticed what might be some unexpected behaviour when the variables are demeaned. The demeaned dataset created by etwfe() seems to have a number of variables with NA column names that don't appear to enter the estimation. As an example, I have augmented the quickstart example on the package homepage to estimate heterogeneous treatment effects across quartiles of lpop:

library(data.table)
data("mpdta", package = "did")
setDT(mpdta)
# Create quartiles of lpop
mpdta[, lpop_quartile := cut(lpop,
                        breaks = quantile(lpop, probs = 0:4/4),
                        labels = 1:4, include.lowest=TRUE)]

mod =
  etwfe(
    fml  = lemp ~ 0, 
    tvar = year,        
    gvar = first.treat, 
    data = mpdta,   
    xvar = lpop_quartile,
    vcov = ~countyreal  
  )

# etwfe_dat() is etwfe with the final line set to return(data) instead of return(est)  
mod_dat = etwfe_dat(
  fml  = lemp ~ 0, 
  tvar = year,        
  gvar = first.treat, 
  data = mpdta,   
  xvar = lpop_quartile,
  vcov = ~countyreal  
)

> colnames(mod_dat)
 [1] "year"     "countyreal"  "lpop"   "lemp"     "first.treat"      "treat"           
 [7] "lpop_quartile"   ".Dtreat"   ".Dtreated_cohort" "lpop_quartile_dm" NA   NA

> head(mod_dat)
  year countyreal     lpop     lemp first.treat treat lpop_quartile .Dtreat .Dtreated_cohort lpop_quartile_dm         NA         NA
1 2003       8001 5.896761 8.461469        2007     1             4   FALSE                1       -0.2356021 -0.3089005  0.7277487
2 2004       8001 5.896761 8.336870        2007     1             4   FALSE                1       -0.2356021 -0.3089005  0.7277487
3 2005       8001 5.896761 8.340217        2007     1             4   FALSE                1       -0.2356021 -0.3089005  0.7277487
4 2006       8001 5.896761 8.378161        2007     1             4   FALSE                1       -0.2356021 -0.3089005  0.7277487
5 2007       8001 5.896761 8.487352        2007     1             4    TRUE                1       -0.2356021 -0.3089005  0.7277487
6 2003       8019 2.232377 4.997212        2007     1             1   FALSE                1       -0.2356021 -0.3089005 -0.2722513

The two NA columns appear to be other levels of lpop_quartile after demeaning. Perhaps I am misunderstanding the process here; I wonder would you be able to confirm if this is the expected behaviour of the function?

Also, I notice that the process for demeaning the `xvar' is relative to time rather than the usual group demeaning that appears in the case of covariates in Wooldridge (2021):

https://github.com/grantmcdermott/etwfe/blob/55a032a4cc3a8672f9ec436328368ee2690deddb/R/etwfe.R#L334-L341

I wonder if you could also confirm that this is as expected? Could you provide some intuition as to why a different approach is required here if this is the case?

Many thanks again for providing such an excellent package, and I very much appreciate your help on this.

grantmcdermott commented 1 year ago

Dear Philip, thanks very much for the detailed post and kind words about the package. I'm inundated at work at the moment and won't be able to look at this properly for a few days. But I'm tagging @frederickluser, who wrote most of the xvar code and might be able to answer quickly.

grantmcdermott commented 1 year ago

Hi @PhilipCarthy, sorry it's taking me a while to get to this. Far too many balls in the air ATM.

The two NA columns appear to be other levels of lpop_quartile after demeaning. Perhaps I am misunderstanding the process here; I wonder would you be able to confirm if this is the expected behaviour of the function?

Yup, this looks like a bug and is due to the way we "recycle" the covariate names here. My first instinct is to add these columns directly to underlying formula that etwfe creates instead. But this will likely create complications elsewhere in the code. I need some time to think of / test bullet proof solution. Please feel free to put in a PR if you have something in mind.

Also, I notice that the process for demeaning the `xvar' is relative to time rather than the usual group demeaning that appears in the case of covariates in Wooldridge (2021)... I wonder if you could also confirm that this is as expected? Could you provide some intuition as to why a different approach is required here if this is the case?

Hmmm, again I think you're right. To me this looks like a copy-paste error, where line 337 above should be:

 dm_fml = stats::reformulate(c(tvar, gvar), response = xvar)

i.e., include gvar in the demeaning formula.

Testing this tweak with the heterogeneous ATT example from the docs yields the following:

> devtools::load_all(".") # testing version of ewtfe
ℹ Loading etwfe
> 
> data("mpdta", package = "did")
> 
> gls = c("IL" = 17, "IN" = 18, "MI" = 26, "MN" = 27,
+         "NY" = 36, "OH" = 39, "PA" = 42, "WI" = 55)
> mpdta$gls = substr(mpdta$countyreal, 1, 2) %in% gls
> 
> hmod = etwfe(
+   lemp ~ lpop, tvar = year, gvar = first.treat, data = mpdta, 
+   vcov = ~countyreal,
+   xvar = gls           ## <= het. TEs by gls
+ )
> 
> emfx(hmod)

    Term                 Contrast .Dtreat   gls Estimate Std. Error     z Pr(>|z|)   2.5 %  97.5 %
 .Dtreat mean(TRUE) - mean(FALSE)    TRUE FALSE  -0.0637     0.0372 -1.71   0.0873 -0.1367 0.00931
 .Dtreat mean(TRUE) - mean(FALSE)    TRUE  TRUE  -0.0472     0.0267 -1.77   0.0764 -0.0995 0.00500

Columns: term, contrast, .Dtreat, gls, estimate, std.error, statistic, p.value, conf.low, conf.high, predicted, predicted_hi, predicted_lo

This output (i.e., ATTS of -0.0637 and -0.0472) intuitively seems more correct than what we currently get (i.e., ATTs of -0.0433 and -0.0366) since they span the ATT of the full sample (i.e., -0.0506).

@frederickluser are we missing something obvious here? Please let me know ASAP. Otherwise I'd like to commit this fix and submit a new version to CRAN soon, since I don't want the package to be giving misleading results for these heterogeneous treatment effects..

PhilipCarthy commented 1 year ago

Dear Grant,

Thank you for looking into this - I very much appreciate the detailed testing. I agree that the results in the example here are much more intuitive, given the ATT of the full sample. Since opening this issue, I have been trying to understand the underlying theory a bit more. I note that demeaning relative to all treated units (as the weights argument in the current fixest::demean() call for xvar would imply?) is mentioned in Wooldridge's (2021) paper. See, for example, equation 5.7 in this version. However, I think this is specifically in relation to the common timing setting where there is no further group structure in the data. I don't see an equivalent discussion in relation to the staggered case, but throughout Section 6, where staggered interventions are discussed, all demeaning appears to occur relative to groups. My apologies; I am afraid my understanding of the method is not yet sufficient to be certain of what is correct here.

Many thanks also for looking into the name-recycling issue. I also thought that adding the names directly to the formula could work, and some manual checks seemed to give sensible answers. However, I was uncertain of how to translate this to a general case that could work more broadly in the etwfe() function.

Many thanks again for looking into all of this.

frederickluser commented 1 year ago

Dear Grant, Dear Philip,

Thanks a lot for digging into this. To start with some context, Wooldridge's goal with xvar is "to make the common trends assumption more plausible and to allow treatment effects to vary by observed covariates." Okay, so we are interested here in the second part.

Then, if I understand the paper correctly, we want to center "the covariates at cohort averages", meaning, we want to demean $x_{it}$ in a way that gets us $\bar x_i$ - getting rid of the time $t$ dimension.

Or, following the version that you sent @PhilipCarthy , Wooldridge writes on page 44 (6.34) that $\dot x_{ir} = x_i - \bar xr$ and $\dot x{ir}$ is the demeaned variable we want to use in the regression.

Now that was what I wanted to achieve with dm_fml = stats::reformulate(c(tvar), response = xvar) instead of dm_fml = stats::reformulate(c(tvar, gvar), response = xvar).

But I have to say, I'm very confused now how to correctly use reformulate here. What do you think?

Thanks and best, Frederic

grantmcdermott commented 1 year ago

Hi folks, thanks for the ongoing discussion. I've looked over the paper again and think (agree) the correct stance is simply to center by gvar (i.e., the cohort):

dm_fml = stats::reformulate(gvar, response = xvar)

(It's not so important now, but looking through my code I remembered that the original demeaning by both tvar+gvar was an artifact of restricting covariates to be time-invariant... a restriction that I dropped in the last release of the package.)

There are a couple of other things that I'd like to clean up internally, which I'll hopefully get to soon. Again, very much appreciate the input and comments here.

grantmcdermott commented 1 year ago

@PhilipCarthy and @frederickluser I've (finally) pushed through a PR that fixes the cohort grouping in #30. It passes all CI checks but I'd appreciate it if either of you could clone my branch and test with some of the manual examples we discussed above. I had to rush this out because I'm heading out on vacation. In fact, I already left and don't have access to my computer ;-)

frederickluser commented 1 year ago

Dear Grant,

I ran some simulations with the code in your PR against the last CRAN version. Differences seem to be minor for this data, i.e., estimates are very similar in both cases:

# The "old" code
    Term                 Contrast Estimate Std. Error      z   Pr(>|z|)  2.5 % 97.5 % .Dtreat sex
 .Dtreat mean(TRUE) - mean(FALSE)   3.9519   0.011669 338.67 < 2.22e-16 3.9290 3.9748    TRUE   4
 .Dtreat mean(TRUE) - mean(FALSE)   0.2369   0.010326  22.94 < 2.22e-16 0.2167 0.2571    TRUE   0
 .Dtreat mean(TRUE) - mean(FALSE)   2.1107   0.010592 199.27 < 2.22e-16 2.0900 2.1315    TRUE   2
 .Dtreat mean(TRUE) - mean(FALSE)   5.0149   0.010901 460.03 < 2.22e-16 4.9936 5.0363    TRUE   5
 .Dtreat mean(TRUE) - mean(FALSE)   1.1544   0.010634 108.56 < 2.22e-16 1.1336 1.1752    TRUE   1
 .Dtreat mean(TRUE) - mean(FALSE)   3.1741   0.009495 334.28 < 2.22e-16 3.1555 3.1927    TRUE   3

# Using the PR
    Term                 Contrast Estimate Std. Error      z   Pr(>|z|)  2.5 % 97.5 % .Dtreat sex
 .Dtreat mean(TRUE) - mean(FALSE)   3.9505   0.011588 340.92 < 2.22e-16 3.9278 3.9733    TRUE   4
 .Dtreat mean(TRUE) - mean(FALSE)   0.2353   0.010675  22.04 < 2.22e-16 0.2143 0.2562    TRUE   0
 .Dtreat mean(TRUE) - mean(FALSE)   2.1092   0.010739 196.41 < 2.22e-16 2.0882 2.1303    TRUE   2
 .Dtreat mean(TRUE) - mean(FALSE)   5.0136   0.010679 469.48 < 2.22e-16 4.9927 5.0345    TRUE   5
 .Dtreat mean(TRUE) - mean(FALSE)   1.1529   0.010874 106.02 < 2.22e-16 1.1315 1.1742    TRUE   1
 .Dtreat mean(TRUE) - mean(FALSE)   3.1726   0.009533 332.82 < 2.22e-16 3.1540 3.1913    TRUE   3

I also tried with large variation in time fixed effects – same result. So maybe it doesn't matter too much. Moreover, the data_collapse default works very nicely! The only thing I noticed is the first point @PhilipCarthy mentioned: the NA-named columns that are added in the data.

Thanks for your work and enjoy your vacation, Frederic

grantmcdermott commented 1 year ago

Super, thanks @frederickluser! Can you do one final check for me: What do you get with Great Lake States (hmod) example from the vignette? Once we confirm that those results are sensible, and I'm 99% sure they will be, I'll merge the PR.

PS. Thanks for calling out the still-present NA issue. I plan to address that separately once I'm back from vacation.

frederickluser commented 1 year ago

Perfect. So these are the results:

First for the current CRAN version:

   Term                 Contrast Estimate Std. Error      z   Pr(>|z|)   2.5 %  97.5 % .Dtreat   gls
 .Dtreat mean(TRUE) - mean(FALSE)   0.1332    0.07528  1.770   0.076725 -0.0143  0.2808    TRUE FALSE
 .Dtreat mean(TRUE) - mean(FALSE)  -0.5857    0.12430 -4.712 2.4499e-06 -0.8294 -0.3421    TRUE  TRUE

And for the new code in the PR

   Term                 Contrast Estimate Std. Error      z  Pr(>|z|)    2.5 %   97.5 % .Dtreat   gls
 .Dtreat mean(TRUE) - mean(FALSE)   0.2546    0.11202  2.273 0.0230154  0.03509  0.47419    TRUE FALSE
 .Dtreat mean(TRUE) - mean(FALSE)  -0.2236    0.07446 -3.003 0.0026702 -0.36958 -0.07769    TRUE  TRUE

Thus, with this real-world data we see indeed some difference! The ATT is in both cases

    Term                 Contrast Estimate Std. Error      z   Pr(>|z|)    2.5 %  97.5 % .Dtreat
 .Dtreat mean(TRUE) - mean(FALSE) -0.04771    0.01327 -3.595 0.00032499 -0.07372 -0.0217    TRUE

Then, I calculate also the wighted mean for both: CRAN: (-0.5857(520/2500) + 0.1332((2500-520)/2500)) = -0.0163 PR: (-0.2236(520/2500) + .2546((2500-520)/2500)) = 0.15513

Now, you see that the CRAN average of the heterogeneous effects is far closer to the ATT. What do you think of this...?

grantmcdermott commented 1 year ago

I finally had a chance to test myself. (I was hesitating, since your results don't seem to correspond to what I initially got on my branch @frederickluser. I'm not sure why TBH.) Regardless, I think this should be good to go now, so I'll merge #30. Will close this issue once I manage to get around to the NA column issue. Thanks for the patience everyone.

grantmcdermott commented 1 year ago

@PhilipCarthy This should now be fixed in the dev version. If I run your model from above and retrieve the dataset (via fixest:::fetch_data) then we can see all of the columns are accounted for.

# <snip>
> fixest:::fetch_data(mod) |> head()
  year countyreal     lpop     lemp first.treat treat lpop_quartile .Dtreat .Dtreated_cohort
1 2003       8001 5.896761 8.461469        2007     1             4   FALSE                1
2 2004       8001 5.896761 8.336870        2007     1             4   FALSE                1
3 2005       8001 5.896761 8.340217        2007     1             4   FALSE                1
4 2006       8001 5.896761 8.378161        2007     1             4   FALSE                1
5 2007       8001 5.896761 8.487352        2007     1             4    TRUE                1
6 2003       8019 2.232377 4.997212        2007     1             1   FALSE                1
  lpop_quartile2_dm lpop_quartile3_dm lpop_quartile4_dm
1        -0.2061069        -0.3129771          0.740458
2        -0.2061069        -0.3129771          0.740458
3        -0.2061069        -0.3129771          0.740458
4        -0.2061069        -0.3129771          0.740458
5        -0.2061069        -0.3129771          0.740458
6        -0.2061069        -0.3129771         -0.259542

Running emfx(mod) also appears to be producing sensible output.

(The linked PR above includes a simulation example with known output, so I'm pretty confident at this point that everything is working properly.)

I plan to submit a new version to CRAN shortly, probably tomorrow, to get this fix out to everyone. Thanks again for drawing my attention to these issues and please don't hesitate to flag any further unexpected behaviour (same for you @frederickluser).

grantmcdermott commented 1 year ago

P.S. These updates are in v0.3.2 of the package, which is now up on CRAN.

PhilipCarthy commented 1 year ago

@grantmcdermott Thank you very much for all your efforts to develop a fix on this. The new version of the package is generating the expected results for me too. Thank you @frederickluser for your work on this as well. The package is a great contribution that will assist me, and I suspect many others, in propoerly implementing the estimator in a range of research contexts.