ssdavenport / microsynth

Synthetic controls for micro-level data
16 stars 9 forks source link

Error in displayed MSE #20

Closed JoakimWeill closed 3 years ago

JoakimWeill commented 3 years ago

Hi,

The MSE displayed by microsynth when using backup models seems too low. If so this would lead microsynth to accept the third model (when use.backup = T), even if the true MSE is much higher. See example below adapted from your tutorial using a single treated unit:

library(reprex)
library(microsynth)
library(tidyverse)
library(magrittr)
#> 
#> Attaching package: 'magrittr'
#> The following object is masked from 'package:purrr':
#> 
#>     set_names
#> The following object is masked from 'package:tidyr':
#> 
#>     extract

match.out <- c("any_crime")
cov.var <- c("TotalPop", "BLACK", "HISPANIC", "Males_1521", "HOUSEHOLDS", 
             "FAMILYHOUS", "FEMALE_HOU", "RENTER_HOU", "VACANT_HOU")

set.seed(86880)
ids.t <- names(table(seattledmi$ID[seattledmi$Intervention==1]))
ids.c <- names(table(seattledmi$ID[seattledmi$Intervention==0]))
ids.synth <- c(sample(ids.t, 1), sample(ids.c, 100))
seattledmi.one <- seattledmi[is.element(seattledmi$ID, as.numeric(ids.synth)), ]

sea8 <- microsynth(seattledmi.one, 
                   idvar="ID", timevar="time", intvar="Intervention", 
                   match.out=match.out, match.covar=cov.var, 
                   result.var=match.out,
                   test="lower", perm=250, jack=FALSE, 
                   check.feas=TRUE, use.backup=TRUE,
                   n.cores = min(parallel::detectCores(), 2))
#> Setting end.pre = 12.
#> WARNING: There is a low number (2) of cases in the treatment or intervention group.
#> Setting use.survey = FALSE.
#> Be cautious of results involving linearization or confidence intervals.
#> Calculating weights...
#> Checking feasibility of first model...
#> First model is infeasible: Time = 0.01
#> Checking feasibility of second model...
#> Second model is infeasible: Time = 0
#> Will use third model.
#> Created main weights for synthetic control: Time = 0.02
#> Matching summary for main weights:
#>              Targets Final.Weighted.Control All.scaled
#> Intercept          2                 2.0000     2.0000
#> TotalPop         294               308.1972   124.1584
#> BLACK             41                19.5505     9.5644
#> HISPANIC          28                13.0530     6.9901
#> Males_1521         3                 2.4229     4.0594
#> HOUSEHOLDS       213               207.5760    57.6436
#> FAMILYHOUS        34                41.8596    25.7822
#> FEMALE_HOU         3                 5.4056     4.0990
#> RENTER_HOU       210               185.6259    32.8317
#> VACANT_HOU        33                11.1796     3.9604
#> any_crime.12      20                17.3672     3.7822
#> any_crime.11      25                14.0115     3.8020
#> any_crime.10      14                10.4546     3.2475
#> any_crime.9       17                13.1787     2.9505
#> any_crime.8       19                12.1810     3.3069
#> any_crime.7       24                13.9868     3.2871
#> any_crime.6       18                20.2189     3.4059
#> any_crime.5       19                 9.8776     3.0891
#> any_crime.4       24                16.7462     3.4257
#> any_crime.3       23                18.5420     3.6238
#> any_crime.2       19                18.8779     2.8119
#> any_crime.1       12                13.6541     3.1683
#> 
#> Calculating weights for permutation groups...
#> Parallelizing with n.cores = 2...
#> Completed calculation of all replication weights: Time = 2.82
#> 
#> Removing 0 permutation groups with pre-intervention MSE > cut.mse.
#> 
#> Calculation of weights complete: Total time = 2.88
#> 
#> Calculating basic statistics for end.post = 16...
#> Completed calculation of basic statistics for end.post = 16.  Time = 0.09
#> 
#> Calculating survey statistics for end.post = 16...
#> Completed survey statistics for main weights: Time = 1.03
#> Completed calculation of survey statistics for end.post = 16.  Time = 1.03
#> 
#> microsynth complete: Overall time = 4.05

sea8$w$Summary
#>              Targets Final.Weighted.Control All.scaled
#> Intercept          2               2.000000   2.000000
#> TotalPop         294             308.197220 124.158416
#> BLACK             41              19.550488   9.564356
#> HISPANIC          28              13.052951   6.990099
#> Males_1521         3               2.422879   4.059406
#> HOUSEHOLDS       213             207.576029  57.643564
#> FAMILYHOUS        34              41.859589  25.782178
#> FEMALE_HOU         3               5.405586   4.099010
#> RENTER_HOU       210             185.625871  32.831683
#> VACANT_HOU        33              11.179575   3.960396
#> any_crime.12      20              17.367199   3.782178
#> any_crime.11      25              14.011529   3.801980
#> any_crime.10      14              10.454636   3.247525
#> any_crime.9       17              13.178666   2.950495
#> any_crime.8       19              12.181019   3.306931
#> any_crime.7       24              13.986819   3.287129
#> any_crime.6       18              20.218945   3.405941
#> any_crime.5       19               9.877633   3.089109
#> any_crime.4       24              16.746176   3.425743
#> any_crime.3       23              18.541982   3.623762
#> any_crime.2       19              18.877878   2.811881
#> any_crime.1       12              13.654130   3.168317

plot_microsynth(sea8)

MSE <- sea8$w$MSE %>% t() %>% as.data.frame() %>% 
  rownames_to_column() %>% as_tibble() %>%
  filter(rowname == "Main") %>%
  glimpse()
#> Rows: 1
#> Columns: 7
#> $ rowname                              <chr> "Main"
#> $ `First model: Primary variables`     <dbl> 114.4362
#> $ `First model: Second variables`      <dbl> NA
#> $ `Secondary model: Primary variables` <dbl> 460.6346
#> $ `Second model: Secondary variables`  <dbl> 38.74974
#> $ `Third model: Primary variables`     <dbl> 1.972152e-31
#> $ `Third model: Secondary variables`   <dbl> 119.8855

Groups <- sea8$w$keep.groups  %>% as.data.frame() %>% 
  rownames_to_column() %>% as_tibble() %>%
  filter(rowname == "Main") %>%
  glimpse()
#> Rows: 1
#> Columns: 2
#> $ rowname <chr> "Main"
#> $ .       <lgl> TRUE

Models <- sea8$w$Model  %>% as.data.frame() %>% 
  rownames_to_column() %>% as_tibble() %>%
  filter(rowname == "Main") %>%
  glimpse()
#> Rows: 1
#> Columns: 2
#> $ rowname <chr> "Main"
#> $ .       <chr> "Third"

Looking at differences in outcomes between Targets and Final.Weighted.Control, it seems that the MSE is pretty high for any_crime (the only outcome included). Yet, theThird model: Primary variables shows an MSE of 1.97e-31. Am I missing something?

Created on 2021-02-11 by the reprex package (v0.3.0)

ssdavenport commented 3 years ago

this has been addressed in most recent patch: v2.0.20