insongkim / PanelMatch

111 stars 34 forks source link

Best practices for balancing on pre-trends in outcome variable? #95

Closed zhaoliresearch closed 2 years ago

zhaoliresearch commented 2 years ago

Hello,

If differential pre-trends is a concern in my TSCS data, would it be kosher to incorporate lagged changes (rather than levels) of the outcome variable in my matching formula, and then assess covariate balance on those lagged changes in the refined matched sets?

To illustrate my point, suppose we run the following lines of code on the dem dataset first:

library(dplyr) dem <- dem %>% arrange(wbcode2, year) %>% group_by(wbcode2) %>% mutate(diff_y = y-lag(y)) %>% ungroup()

Could we assess pre-trends in matched data based on covariate balance on diff_y thus generated? And if we were to minimize imbalance in diff_y, would it make more sense to incorporated lagged value of diff_y (rather than lagged values of y) in the matching formula?

In the dem dataset, Panelmatch's vignette gives the following example of refined matching based on lagged y (and lagged tradewb):

PM.results.maha <- PanelMatch(lag = 4, time.id = "year", unit.id = "wbcode2", treatment = "dem", refinement.method = "mahalanobis", data = dem, match.missing = TRUE, covs.formula = ~ I(lag(tradewb, 1:4)) + I(lag(y, 1:4)), size.match = 5, qoi = "att", outcome.var = "y", lead = 0:4, forbid.treatment.reversal = FALSE, use.diagonal.variance.matrix = TRUE)

If I examine covariate balance on diff_y from PM.results.maha based on the following line of code:

get_covariate_balance(PM.results.maha$att, data=dem, covariates = "diff_y", plot=TRUE)

I get the following plot: covbal_lagy

However, if I replace y with diff_y in the matching formula as follows:

PM.results.maha2 <- PanelMatch(lag = 4, time.id = "year", unit.id = "wbcode2", treatment = "dem", refinement.method = "mahalanobis", data = dem, match.missing = TRUE, covs.formula = ~ I(lag(tradewb, 1:4)) + I(lag(diff_y, 1:4)), size.match = 5, qoi = "att", outcome.var = "y", lead = 0:4, forbid.treatment.reversal = FALSE, use.diagonal.variance.matrix = TRUE)

And then run the exact same line of code to assess covariate balance on diff_y, I would get the following plot: covbal_lagdiffy

It seems to me that the second approach reduces imbalance in pre-trends, but please let me know whether I'm mistaken about this. Thank you!