synth-inference / synthdid

Synthetic difference in differences
https://synth-inference.github.io/synthdid
BSD 3-Clause "New" or "Revised" License
262 stars 98 forks source link

Inconsistent plot when adding covariates #76

Open ledainga opened 2 years ago

ledainga commented 2 years ago

Hi,

I have added covariates as an N X T X C array when using the synthdid_estimate(), sc_estimate() and did_estimate() functions. The rows and columns of this 3-D array are ordered in the same way as those of matrix setup$Y. Nevertheless, when I use synthdid_plot() to plot the results of all three methods (DID, SC, SDID), the treated outcome is not the same in each plot (see attached image). Why does this difference exist? Shouldn't all three plots display exactly the same treated outcome as this is an observed rather than an estimated outcome?

plot

Thanks a lot!

davidahirshberg commented 2 years ago

When covariates are passed, synthdid_estimate estimates regression coefficients beta for predicting Y from X, then applies the SDID estimator to the regression residuals Y-X beta. And as currently implemented, this is not a least squares regression —- instead, beta, lambda, and omega are chosen together so that the regressions fitting the unit and time weights to Y - X beta fit as well as possible.*  What you’re seeing is plotted over time above is not the trajectory of Y itself but that of Y - X beta, as that is what you’d want to have parallel trends, and because beta depends on the unit and time weights and therefore differs from SDID to SC to DID, what is shown as the trajectory of the treated unit varies from plot to plot as well.

This approach to using covariates isn’t one we’ve had a chance to explore all that much and may not be appropriate for your application.  It may, for example, make more sense for you to estimate beta some other way then pass Y - X beta to synthdid_estimate as Y without passing covariates. https://github.com/skranz/xsynthdid implements another approach.

*To be precise, they are chosen so that the synthetic control weighted average of controls predicts Y - X beta well for the average treated units during pre-treatment periods and the time weighted average of pre-treatment periods predicts Y - X beta well for the average post-treatment period among control units, by minimizing the sum of mean squared errors for those two predictions. Letting Y(beta) := Y-X beta and using the notation of Section 4 of the paper, this is (1/T_pre) || omegatr^T Y{tr, pre}(beta) - omegaco^T Y{co,pre}(beta) ||^2 + zeta.omega^2 ||omega||^2 + (1/Nco)|| Y{co,post}(beta) lambdapost - Y{co,pre} lambda_pre ||^2 + zeta.lambda^2 || lambda ||^2