synth-inference / synthdid

Synthetic difference in differences
https://synth-inference.github.io/synthdid
BSD 3-Clause "New" or "Revised" License
262 stars 98 forks source link

permutation method for SDID #77

Open fell1121 opened 2 years ago

fell1121 commented 2 years ago

Hello, SDID community. In the original SDID paper page 29, authors proposed Algorithm 4: Placebo Variance Estimation to calculate p value in case where treated unit is few. In Abadie's Synthetic control method, they proposed a permutation method for calculating p value. "A permutation distribution can be obtained by iteratively reassigning the treatment to the units in the donor pool and estimating “placebo effects” in each iteration. Then, the permutation distribution is constructed by pooling the effect estimated for the treated unit together with placebo effects estimated for the units in the donor pool. The effect of the treatment on the unit affected by the intervention is deemed significant when its magnitude is extreme relative to the permutation distribution. In my research, I calculated the p value using both place variance estimation and modified permutation method. The SDID method allows a gap between sdid predict and true value, donated as alpha in page 4 equation 1.2. I find the sdid predict and the minus the gap. Then, I can perform permutation test as Abadie's permutation test. See graph 2 and 3 where I first graph the All Employee, Arts, Entertainment, and Recreation in Louisiana and its SDID predict. In graph 3, I minus sdid predict with the gap alpha and graph again. I use SDID to estimate the impact of Hurricane Katerina on the labor market by sector. In the first attached, LA_RE represents monthly employment of Real Estate. column 2 is the estimate, column 3 "Group" is the end period. I set treatment time in 20050801. Using Algorithm 4: Placebo Variance Estimation I found column 4 SE, if I divide column 2 by SE, beside first raw, it will always produce high T value in a gaussian framework. then I use Abadie's permutation method to find p value in column 10, it looks all good too. I used this method successfully to detect impact on many sectoral employment. But in one sector, it comes with problem. For All Employee, Arts, Entertainment, and Recreation in Louisiana, you can see from graph 2 and 3, it has sharp decline and from attached 4, despite standard error is low in most of raw with respect to estimate, the p value calculated by permutation method is very high. The reason is because SDID can't find a very good sdid, that is "sdid predict - gap" has a huge pre Mean square predict error range from 3 to 5, as shown in column 6 attached 4. In fact, using this method, I found for this variable, SDID produced one of highest MSE_pre. See attached 5.

Here are my question: 1, We can't use estimate divide by standard error to find it t value? right? apparently, we do not assume gaussian distribution. 2, Do you think this modified permutation test for sdid is valid? 3, In this case some variable, SDID can't produce low pre_MSE, what can we do? 4, It seems, I always produce a SDID that is below the observed value, how can I change the default? 5, Using the permutation test, by reassigning treatment status for every comparison group, I can get estimate and standard error for every units. Then how I compare the standard error from all units?

Screen Shot 2021-12-30 at 12 41 19 AM Screen Shot 2021-12-30 at 1 19 40 AM Screen Shot 2021-12-30 at 1 19 57 AM Screen Shot 2021-12-30 at 12 49 35 AM Screen Shot 2021-12-30 at 12 57 44 AM

NH AER

mldoucette1 commented 2 years ago

Fell 1121 , thanks for your comment. By any chance, could you share your code how for you obtained the pre-treatment RMSE?