ssdavenport / microsynth

Synthetic controls for micro-level data
16 stars 9 forks source link

Poor matching in backup method #21

Closed alannaflores closed 2 years ago

alannaflores commented 3 years ago

Hi! I have been working with your package for a while and have consistently had a problem with the backup method producing poor pre-intervention fits.

It is required to use backup parameters when minimizing over time-varying covariates in match.out.min, but when using backup parameters I get significantly poorer matches than without.

Here is an example of the problem:

I am looking at a Pennsylvania law passed in 2000.

First example when I do not have the backup parameter turned on and no covariates: example1.pdf

Second example with all the same parameters and backup turned on: example2.pdf

Third example with covariates added in match.out.min and backup turned on: example3.pdf

Fourth example with covariates added in match.out and backup turned off: example4.pdf

microsynth_bug.zip

I have attached all my code and my dataset and would really appreciate any comments you have on why the backup parameter results in such poor RMSE values and what alternatives there are for minimizing over time-varying covariates.

michaelwrobbins commented 3 years ago

First, please check out this JSS paper if you haven’t seen it:

https://www.jstatsoft.org/article/view/v097i02

Therein, it explains that the microsynth algorithm starts by assuming that weights exist that satisfy all constraints (i.e., an exact match between treatment and synthetic control is feasible). In this case, the calibrate() function in the survey package is used to find weights. If it turns out that no such weights exist, a series of backup models are used that in essence find weights that satisfy the constraints as closely as possible instead of exactly. In that case, the LowRankQP() function is used to find the weights. Setting use.backup = FALSE tells microsynth to forget about the LowRankQP() function and just return whatever the calibrate() function gives you when it’s trying to find an exact match across all constraints (in a lot of cases, what’s returned here will be REALLY bad).

So, the matching summary from your example 1 is as follows (in which case weights that don’t exactly match treatment and synthetic control are found by the calibrate function):

                                   Targets Weighted.Control All.scaled 
Intercept                                1           1.7281     1.0000
FirearmSuicide_M_Adult_Deaths.1999     592         600.9396   224.7931
FirearmSuicide_M_Adult_Deaths.1998     668         666.0581   232.5862
FirearmSuicide_M_Adult_Deaths.1997     660         649.1496   232.0000
FirearmSuicide_M_Adult_Deaths.1996     643         652.6334   240.3448
FirearmSuicide_M_Adult_Deaths.1995     694         689.1775   236.3103

You might think this is good but it really isn’t. Notice that the intercept on the weighted control is 1.72. This means the weights sum to 1.72 when they really should sum to 1 (which is the number of cases in the treatment group). When you set use.backup = TRUE, it’s using a model that says the weights have to sum 1 one and then finds the weights that satisfy the other constraints as closely as possible (this process is described in the JSS paper).

In summary, the microsynth algorithm actually considers the Example 2 weights to be better than the Example 1 weights because former sum to one when the latter do not. In the development of this algorithm and method, I went back and forth in terms of deciding how important it is that the weights sum to the number of cases in the treatment group. In the end (based upon some other datasets I was using), I decided it was very important because if you don’t impose this restriction, the results could be quite misleading.

-MR