matheusfacure / python-causality-handbook

Causal Inference for the Brave and True. A light-hearted yet rigorous approach to learning about impact estimation and causality.
https://matheusfacure.github.io/python-causality-handbook/landing-page.html
MIT License
2.61k stars 456 forks source link

Possible Error in Ch 25: Calculate Regularization Function #313

Closed trevorvogel closed 1 year ago

trevorvogel commented 1 year ago

After reading the seminal paper on synthetic DiD, I think the way zeta is calculated in calculate_regularization may be incorrect depending how the input data is ordered. As it stands, the function will calculate the true first difference only if the data frame is ordered by time. Otherwise, it will calculate the "first difference" for each state/unit in whatever order the observations appear. I believe the following edit will solve this problem and ensure the first difference is always calculated properly. Thanks!

first_diff_std = (data .query(f"~{post_col}") .query(f"~{treat_col}") .sort_values(year_col).groupby(state_col) [outcome_col] .diff() .std())

matheusfacure commented 1 year ago

Good point! Thanks!