Closed PhilipCarthy closed 1 year ago
Dear Philip, thanks very much for the detailed post and kind words about the package. I'm inundated at work at the moment and won't be able to look at this properly for a few days. But I'm tagging @frederickluser, who wrote most of the xvar
code and might be able to answer quickly.
Hi @PhilipCarthy, sorry it's taking me a while to get to this. Far too many balls in the air ATM.
The two NA columns appear to be other levels of lpop_quartile after demeaning. Perhaps I am misunderstanding the process here; I wonder would you be able to confirm if this is the expected behaviour of the function?
Yup, this looks like a bug and is due to the way we "recycle" the covariate names here. My first instinct is to add these columns directly to underlying formula that etwfe creates instead. But this will likely create complications elsewhere in the code. I need some time to think of / test bullet proof solution. Please feel free to put in a PR if you have something in mind.
Also, I notice that the process for demeaning the `xvar' is relative to time rather than the usual group demeaning that appears in the case of covariates in Wooldridge (2021)... I wonder if you could also confirm that this is as expected? Could you provide some intuition as to why a different approach is required here if this is the case?
Hmmm, again I think you're right. To me this looks like a copy-paste error, where line 337 above should be:
dm_fml = stats::reformulate(c(tvar, gvar), response = xvar)
i.e., include gvar
in the demeaning formula.
Testing this tweak with the heterogeneous ATT example from the docs yields the following:
> devtools::load_all(".") # testing version of ewtfe
ℹ Loading etwfe
>
> data("mpdta", package = "did")
>
> gls = c("IL" = 17, "IN" = 18, "MI" = 26, "MN" = 27,
+ "NY" = 36, "OH" = 39, "PA" = 42, "WI" = 55)
> mpdta$gls = substr(mpdta$countyreal, 1, 2) %in% gls
>
> hmod = etwfe(
+ lemp ~ lpop, tvar = year, gvar = first.treat, data = mpdta,
+ vcov = ~countyreal,
+ xvar = gls ## <= het. TEs by gls
+ )
>
> emfx(hmod)
Term Contrast .Dtreat gls Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 %
.Dtreat mean(TRUE) - mean(FALSE) TRUE FALSE -0.0637 0.0372 -1.71 0.0873 -0.1367 0.00931
.Dtreat mean(TRUE) - mean(FALSE) TRUE TRUE -0.0472 0.0267 -1.77 0.0764 -0.0995 0.00500
Columns: term, contrast, .Dtreat, gls, estimate, std.error, statistic, p.value, conf.low, conf.high, predicted, predicted_hi, predicted_lo
This output (i.e., ATTS of -0.0637 and -0.0472) intuitively seems more correct than what we currently get (i.e., ATTs of -0.0433 and -0.0366) since they span the ATT of the full sample (i.e., -0.0506).
@frederickluser are we missing something obvious here? Please let me know ASAP. Otherwise I'd like to commit this fix and submit a new version to CRAN soon, since I don't want the package to be giving misleading results for these heterogeneous treatment effects..
Dear Grant,
Thank you for looking into this - I very much appreciate the detailed testing. I agree that the results in the example here are much more intuitive, given the ATT of the full sample. Since opening this issue, I have been trying to understand the underlying theory a bit more. I note that demeaning relative to all treated units (as the weights argument in the current fixest::demean()
call for xvar
would imply?) is mentioned in Wooldridge's (2021) paper. See, for example, equation 5.7 in this version. However, I think this is specifically in relation to the common timing setting where there is no further group structure in the data. I don't see an equivalent discussion in relation to the staggered case, but throughout Section 6, where staggered interventions are discussed, all demeaning appears to occur relative to groups. My apologies; I am afraid my understanding of the method is not yet sufficient to be certain of what is correct here.
Many thanks also for looking into the name-recycling issue. I also thought that adding the names directly to the formula could work, and some manual checks seemed to give sensible answers. However, I was uncertain of how to translate this to a general case that could work more broadly in the etwfe() function.
Many thanks again for looking into all of this.
Dear Grant, Dear Philip,
Thanks a lot for digging into this. To start with some context, Wooldridge's goal with xvar
is "to make the common trends assumption more plausible and to allow treatment effects to vary by observed covariates." Okay, so we are interested here in the second part.
Then, if I understand the paper correctly, we want to center "the covariates at cohort averages", meaning, we want to demean $x_{it}$ in a way that gets us $\bar x_i$ - getting rid of the time $t$ dimension.
Or, following the version that you sent @PhilipCarthy , Wooldridge writes on page 44 (6.34) that $\dot x_{ir} = x_i - \bar xr$ and $\dot x{ir}$ is the demeaned variable we want to use in the regression.
Now that was what I wanted to achieve with dm_fml = stats::reformulate(c(tvar), response = xvar)
instead of dm_fml = stats::reformulate(c(tvar, gvar), response = xvar)
.
But I have to say, I'm very confused now how to correctly use reformulate
here. What do you think?
Thanks and best, Frederic
Hi folks, thanks for the ongoing discussion. I've looked over the paper again and think (agree) the correct stance is simply to center by gvar
(i.e., the cohort):
dm_fml = stats::reformulate(gvar, response = xvar)
(It's not so important now, but looking through my code I remembered that the original demeaning by both tvar+gvar was an artifact of restricting covariates to be time-invariant... a restriction that I dropped in the last release of the package.)
There are a couple of other things that I'd like to clean up internally, which I'll hopefully get to soon. Again, very much appreciate the input and comments here.
@PhilipCarthy and @frederickluser I've (finally) pushed through a PR that fixes the cohort grouping in #30. It passes all CI checks but I'd appreciate it if either of you could clone my branch and test with some of the manual examples we discussed above. I had to rush this out because I'm heading out on vacation. In fact, I already left and don't have access to my computer ;-)
Dear Grant,
I ran some simulations with the code in your PR against the last CRAN version. Differences seem to be minor for this data, i.e., estimates are very similar in both cases:
# The "old" code
Term Contrast Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 % .Dtreat sex
.Dtreat mean(TRUE) - mean(FALSE) 3.9519 0.011669 338.67 < 2.22e-16 3.9290 3.9748 TRUE 4
.Dtreat mean(TRUE) - mean(FALSE) 0.2369 0.010326 22.94 < 2.22e-16 0.2167 0.2571 TRUE 0
.Dtreat mean(TRUE) - mean(FALSE) 2.1107 0.010592 199.27 < 2.22e-16 2.0900 2.1315 TRUE 2
.Dtreat mean(TRUE) - mean(FALSE) 5.0149 0.010901 460.03 < 2.22e-16 4.9936 5.0363 TRUE 5
.Dtreat mean(TRUE) - mean(FALSE) 1.1544 0.010634 108.56 < 2.22e-16 1.1336 1.1752 TRUE 1
.Dtreat mean(TRUE) - mean(FALSE) 3.1741 0.009495 334.28 < 2.22e-16 3.1555 3.1927 TRUE 3
# Using the PR
Term Contrast Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 % .Dtreat sex
.Dtreat mean(TRUE) - mean(FALSE) 3.9505 0.011588 340.92 < 2.22e-16 3.9278 3.9733 TRUE 4
.Dtreat mean(TRUE) - mean(FALSE) 0.2353 0.010675 22.04 < 2.22e-16 0.2143 0.2562 TRUE 0
.Dtreat mean(TRUE) - mean(FALSE) 2.1092 0.010739 196.41 < 2.22e-16 2.0882 2.1303 TRUE 2
.Dtreat mean(TRUE) - mean(FALSE) 5.0136 0.010679 469.48 < 2.22e-16 4.9927 5.0345 TRUE 5
.Dtreat mean(TRUE) - mean(FALSE) 1.1529 0.010874 106.02 < 2.22e-16 1.1315 1.1742 TRUE 1
.Dtreat mean(TRUE) - mean(FALSE) 3.1726 0.009533 332.82 < 2.22e-16 3.1540 3.1913 TRUE 3
I also tried with large variation in time fixed effects – same result. So maybe it doesn't matter too much. Moreover, the data_collapse
default works very nicely!
The only thing I noticed is the first point @PhilipCarthy mentioned: the NA-named columns that are added in the data.
Thanks for your work and enjoy your vacation, Frederic
Super, thanks @frederickluser! Can you do one final check for me: What do you get with Great Lake States (hmod) example from the vignette? Once we confirm that those results are sensible, and I'm 99% sure they will be, I'll merge the PR.
PS. Thanks for calling out the still-present NA issue. I plan to address that separately once I'm back from vacation.
Perfect. So these are the results:
First for the current CRAN version:
Term Contrast Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 % .Dtreat gls
.Dtreat mean(TRUE) - mean(FALSE) 0.1332 0.07528 1.770 0.076725 -0.0143 0.2808 TRUE FALSE
.Dtreat mean(TRUE) - mean(FALSE) -0.5857 0.12430 -4.712 2.4499e-06 -0.8294 -0.3421 TRUE TRUE
And for the new code in the PR
Term Contrast Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 % .Dtreat gls
.Dtreat mean(TRUE) - mean(FALSE) 0.2546 0.11202 2.273 0.0230154 0.03509 0.47419 TRUE FALSE
.Dtreat mean(TRUE) - mean(FALSE) -0.2236 0.07446 -3.003 0.0026702 -0.36958 -0.07769 TRUE TRUE
Thus, with this real-world data we see indeed some difference! The ATT is in both cases
Term Contrast Estimate Std. Error z Pr(>|z|) 2.5 % 97.5 % .Dtreat
.Dtreat mean(TRUE) - mean(FALSE) -0.04771 0.01327 -3.595 0.00032499 -0.07372 -0.0217 TRUE
Then, I calculate also the wighted mean for both: CRAN: (-0.5857(520/2500) + 0.1332((2500-520)/2500)) = -0.0163 PR: (-0.2236(520/2500) + .2546((2500-520)/2500)) = 0.15513
Now, you see that the CRAN average of the heterogeneous effects is far closer to the ATT. What do you think of this...?
I finally had a chance to test myself. (I was hesitating, since your results don't seem to correspond to what I initially got on my branch @frederickluser. I'm not sure why TBH.) Regardless, I think this should be good to go now, so I'll merge #30. Will close this issue once I manage to get around to the NA column issue. Thanks for the patience everyone.
@PhilipCarthy This should now be fixed in the dev version. If I run your model from above and retrieve the dataset (via fixest:::fetch_data
) then we can see all of the columns are accounted for.
# <snip>
> fixest:::fetch_data(mod) |> head()
year countyreal lpop lemp first.treat treat lpop_quartile .Dtreat .Dtreated_cohort
1 2003 8001 5.896761 8.461469 2007 1 4 FALSE 1
2 2004 8001 5.896761 8.336870 2007 1 4 FALSE 1
3 2005 8001 5.896761 8.340217 2007 1 4 FALSE 1
4 2006 8001 5.896761 8.378161 2007 1 4 FALSE 1
5 2007 8001 5.896761 8.487352 2007 1 4 TRUE 1
6 2003 8019 2.232377 4.997212 2007 1 1 FALSE 1
lpop_quartile2_dm lpop_quartile3_dm lpop_quartile4_dm
1 -0.2061069 -0.3129771 0.740458
2 -0.2061069 -0.3129771 0.740458
3 -0.2061069 -0.3129771 0.740458
4 -0.2061069 -0.3129771 0.740458
5 -0.2061069 -0.3129771 0.740458
6 -0.2061069 -0.3129771 -0.259542
Running emfx(mod)
also appears to be producing sensible output.
(The linked PR above includes a simulation example with known output, so I'm pretty confident at this point that everything is working properly.)
I plan to submit a new version to CRAN shortly, probably tomorrow, to get this fix out to everyone. Thanks again for drawing my attention to these issues and please don't hesitate to flag any further unexpected behaviour (same for you @frederickluser).
P.S. These updates are in v0.3.2 of the package, which is now up on CRAN.
@grantmcdermott Thank you very much for all your efforts to develop a fix on this. The new version of the package is generating the expected results for me too. Thank you @frederickluser for your work on this as well. The package is a great contribution that will assist me, and I suspect many others, in propoerly implementing the estimator in a range of research contexts.
Dear Grant,
Thank you very much for creating a great package. I have a query regarding the behaviour of the
xvar
argument for heterogeneous treatment effects:In the case of a categorical
xvar
with more than two levels, I noticed what might be some unexpected behaviour when the variables are demeaned. The demeaned dataset created by etwfe() seems to have a number of variables withNA
column names that don't appear to enter the estimation. As an example, I have augmented the quickstart example on the package homepage to estimate heterogeneous treatment effects across quartiles oflpop
:The two NA columns appear to be other levels of
lpop_quartile
after demeaning. Perhaps I am misunderstanding the process here; I wonder would you be able to confirm if this is the expected behaviour of the function?Also, I notice that the process for demeaning the `xvar' is relative to time rather than the usual group demeaning that appears in the case of covariates in Wooldridge (2021):
https://github.com/grantmcdermott/etwfe/blob/55a032a4cc3a8672f9ec436328368ee2690deddb/R/etwfe.R#L334-L341
I wonder if you could also confirm that this is as expected? Could you provide some intuition as to why a different approach is required here if this is the case?
Many thanks again for providing such an excellent package, and I very much appreciate your help on this.