insongkim / PanelMatch

111 stars 34 forks source link

Placebo test for ps weight matching not working #121

Closed ilango2486 closed 9 months ago

ilango2486 commented 1 year ago

Hi,

I am trying to run a placebo test on the dataset I matched using ps.weight method. However, I keep running into the issue "matchedsets not found" while performing this. I went through the code in detail but could not find any mistake in it. Could you please look into it?

Thanks, Ilango

adamrauh commented 1 year ago

Hi @ilango2486 , thanks for finding this. Would you be able to share a reproducible example? That will help a lot for figuring out the problem(s). Thanks!

-Adam

ilango2486 commented 1 year ago

Hi @adamrauh , thanks for replying promptly. I have attached a smaller version of my dataset, the original is almost 1m rows with 1300 odd treated units and 15000 control. I ran the below code on the dataset, and it threw up the warning and error messages.

I tried different refinement methods, lag periods, combinations of covariates (with and without lags), but am encountering the same issue. Please let me know if you need more information.

Thanks, Ilango

ps_ex_plac <- PanelMatch(lag = 4, time.id = "month", unit.id = "user_id", treatment = "treatment", refinement.method = "ps.weight", data = ps_ex, match.missing = FALSE, covs.formula = ~ I(lag(X1, 1:4)) + I(lag(X2,1:4)) + I(lag(X3,1:4)) + I(lag(X4,1:4)) + I(lag(X5,1:4)) + X6, size.match = 10, qoi = "att", outcome.var = "y", lead = 0:6, forbid.treatment.reversal = FALSE, placebo.test = TRUE)

_Warning messages: 1: In PanelMatch(lag = 4, time.id = "month", unit.id = "user_id", treatment = "treatment", : when placebo.test = TRUE, using the dependent variable in refinment is invalid 2: In performrefinement(lag = lag, time.id = time.id, unit.id = unit.id, : converting unit id variable data to integer

placebo_test(ps_ex_plac, data = ps_ex, lag.in = 4, number.iterations = 1000, confidence.level = 0.95)

_Error in placebo_test(ps_ex_plac, data = psex, lag.in = 4, number.iterations = 1000, : object 'matchedsets' not found

ps_ex.csv

adamrauh commented 1 year ago

Hi @ilango2486 , thanks for sharing this! I identified and subsequently patched a small bug in the commit here c6381e2f76063be7c5c3f55a4150ce78040fd350 . If you update to the latest version of the se_comparison branch, things should work.

Thanks again for this. Let me know if you have other questions.

ilango2486 commented 1 year ago

Works perfectly now! Thank you @adamrauh

I have another question regarding the matching. Do you have any plans for enabling parallelization of the matching process? I suspect that as unique matched sets are created for each control unit, we should be able to run the matching parallelly on multiple cores. This would significantly speed up the process. Currently with my large dataset, each matching stage takes a few hours and the placebo test took almost 2.5 days :)

ilango2486 commented 1 year ago

Another question - The placebo tests give a single estimate for each lagged period, unlike the main results that give an estimate for each lead period that we define. Are the single estimates from the placebo an average for all the lead periods?

adamrauh commented 1 year ago

Thanks again for finding that issue @ilango2486 :)

Regarding your first question: It is probably possible, though not trivial. We are currently looking into it. In the meantime, you could at least speed up the placebo test by using unconditional standard errors -- they should be pretty similar to the bootstrap estimates, but are able to be calculated much more quickly.

As for the second question -- I'm not entirely sure I understand. There isn't really any specific "mapping" between the periods shown in the placebo test and the lead window.

ilango2486 commented 1 year ago

Makes sense. Thanks again @adamrauh