tilburgsciencehub / website

Learn to work more efficiently on empirical research projects.
https://tilburgsciencehub.com
38 stars 48 forks source link

Building Block: Synthetic Control without Exogenous Variables #1090

Open lachlandeer opened 7 months ago

lachlandeer commented 7 months ago

A colleage wrote this email to me a week ago (I've edited it for clarity):

Did you ever estimate a synthetic control method (without exogenous X's)? I am currently using the Synth package in R developed by Abadie. However, this package requires exogenous variables, while recent marketing literature has estimated a synthetic control model without them. So my question is: do you maybe know how to estimate a synthetic control model in R without X variables?

This took me a while to work through, to understand the literature in marketing around this and then figure out how to do it.

Arguments around this come from the following papers:

Footnote 9 in "Do Activity-Based Incentive Plans Work? Evidence from a Large-Scale Field Intervention" published in JMR

and Li (2020) and Kim, Lee, and Gupta (2020)

A building block on this would be superhelpful to show how to d othis

I didn't invest time to try and get the Synth package to work, but could do this quickly with the tidysynth package. I'll put example codes in R below the text.

I would propose the following structure:

Here's Example codes that work:

library(dplyr)
library(haven)
library(Synth) # doesn't let me not have X's
library(tidysynth) # I use this package since it lets me leave out X's

# ---- Example 1 ---- #

smoking_out <-
    smoking %>% # this data comes with the tidy synth package
    # initial the synthetic control object
    synthetic_control(outcome = cigsale, # outcome
                      unit = state, # unit index in the panel data
                      time = year, # time index in the panel data
                      i_unit = "California", # unit where the intervention occurred
                      i_time = 1988, # time period when the intervention occurred
                      generate_placebos=T # generate placebo synthetic controls (for inference)
    ) %>%
    # Using Lags of dep var to be predictors
    generate_predictor(time_window = 1975,
                       cigsale_1975 = cigsale) %>%
    generate_predictor(time_window = 1980,
                       cigsale_1980 = cigsale) %>%
    generate_predictor(time_window = 1988,
                       cigsale_1988 = cigsale) %>%
    # Generate the fitted weights for the synthetic control
    generate_weights(optimization_window = 1970:1988, # time to use in the optimization task
                     margin_ipop = .02,sigf_ipop = 7,bound_ipop = 6 # optimizer options
    ) %>%

    # Generate the synthetic control
    generate_control()

#  How did we do?
smoking_out %>% 
    plot_trends()

smoking_out %>% 
    plot_differences()

# which units are weighted
smoking_out %>% plot_weights()

# gimme a balance table
smoking_out %>% 
    grab_balance_table()

# placebos?
smoking_out %>% 
    plot_placebos()

# significance levels 
smoking_out %>% 
    grab_significance(time_window = 1970:2000)

# --- Example 2 --- #
# adapted from Scott Cunningham's mixtape
prison_out <-
    read_stata("https://github.com/scunning1975/mixtape/raw/master/texas.dta") %>% # this data comes with the tidy synth package
    # initial the synthetic control object
    synthetic_control(outcome = bmprison, # outcome
                      unit = statefip, # unit index in the panel data
                      time = year, # time index in the panel data
                      i_unit = 48, # unit where the intervention occurred (Texas has FIPS == 48)
                      i_time = 1993, # time period when the intervention occurred
                      generate_placebos=T # generate placebo synthetic controls (for inference)
    ) %>%
    # Using Lags of dep var to be predictors -- used every odd year, no theoretical reason to do this 
    generate_predictor(time_window = 1985,
                       dep_1985 = bmprison) %>%
    generate_predictor(time_window = 1987,
                           dep_1987 = bmprison) %>%
    generate_predictor(time_window = 1989,
                           dep_1989 = bmprison) %>%
    generate_predictor(time_window = 1991,
                           dep_1991 = bmprison) %>%
    generate_predictor(time_window = 1993,
                           dep_1993 = bmprison) %>%
    # Generate the fitted weights for the synthetic control
    generate_weights(optimization_window = 1985:1993, # time to use in the optimization task
                     margin_ipop = .02,sigf_ipop = 7,bound_ipop = 6 # optimizer options
    ) %>%
    # Generate the synthetic control
    generate_control()

#  How did we do?
prison_out %>% 
    plot_trends()

prison_out %>% 
    plot_differences()

# which units are weighted
prison_out %>% plot_weights()

# gimme a balance table
prison_out %>% 
    grab_balance_table()

# placebos?
prison_out %>% 
    plot_placebos()

# significance levels 
prison_out %>% 
    grab_significance()
srosh2000 commented 5 months ago

@lachlandeer This is an interesting issue. Wondering if we should also have a more introductory building block on estimation using synthetic control method? This can be a follow up to it.

lachlandeer commented 5 months ago

@srosh2000 Yes, it might be nice to have a couple of synthetic control issues (basics, "what's new") and then this as well.

Some the the building blocks are getting quite long, I'd have a preference for multiple short ones over length... Much easier to find what one wants to see

srosh2000 commented 4 months ago

@VirginiaMirabile As mentioned above you can use the dataset that comes with the tidysynth package?