`lpdid`: compute pre-treatment `ATT`

Wenzhi-Ding commented 4 months ago

I am exploring these new DID methods recently. Maybe I can take this, as well as integrating other prevalent new DID methods into pyfixest like Callaway and Sant'Anna (2021) and Sun and Abraham (2021), if you think it is good to have these functions inside pyfixest. Or I can also write independent packages that calls pyfixest. Not sure from a design perspective which approach is better.

s3alfisc commented 4 months ago

That would be fantastic @Wenzhi-Ding! All contributions in this area would be very much appreciated. I'd be open to either integrate new estimators into pyfixest and you starting a new repo (if you do, one option would be to simply "fork" the did module into a standalone repo and build upon it?). My suggestion is to start within pyfixest (as the module will be easier to find for users) and then we could decide if it makes sense to have a standalone project in the future?

On the did estimators that are implemented, there are a few things that I think would benefit from a second look / a caring hand. I'm mostly listing them here, not necessarily in order of importance :D

it would indeed be nice if the lpdid function would be able to compute pre-treatment ATTs as described above =D
beyond, the linear projections paper suggests some extensions, e.g. equal weighted ATTs and non-absorbing treatments (e.g. workers joining and then leaving unions, turning marketing campaigns on and off etc)
In general, I am not sure if I have created the best API for the abstract DID class. For example, at the moment, I infer the treatment timing based on a "time" variable (tname" and a treatment assignment year variable "gname"). For this to work, I have to ask users to provide integers of the YYYYMMDD format to be able to compute treatment status, which is likely error prone?
It would also be great to standardized the naming of the variables that are produced by the different did methods. For example, lpdid produces the following coefficient names

%load_ext autoreload
%autoreload 2

import pandas as pd
from pyfixest.did.estimation import lpdid, event_study, did2s

url = "https://raw.githubusercontent.com/s3alfisc/pyfixest/master/pyfixest/did/data/df_het.csv"
df_het = pd.read_csv(url)

fit = lpdid(
    df_het,
    yname="dep_var",
    idname="unit",
    tname="year",
    gname="g",
    vcov={"CRV1": "state"},
    pre_window=-20,
    post_window=20,
    att=False
)

fit.tidy().index
#Index(['time_to_treatment::-20', 'time_to_treatment::-19',
#       'time_to_treatment::-18', 'time_to_treatment::-17',
#       'time_to_treatment::-16', 'time_to_treatment::-15',
#       'time_to_treatment::-14', 'time_to_treatment::-13',
# etc

while did2s returns

fit = did2s(
    df_het,
    yname="dep_var",
    first_stage="~ 0 | unit + year",
    second_stage="~i(rel_year)",
    treatment="treat",
    cluster="state",
    i_ref1=[-1.0, np.inf],
)
fit._coefnames 
#['C(rel_year,contr.treatment(base=-1.0))[T.-20.0]',
# 'C(rel_year,contr.treatment(base=-1.0))[T.-19.0]',
# 'C(rel_year,contr.treatment(base=-1.0))[T.-18.0]',

Also, lpdid returns a pd.DataFrame, while event_study() and did2s() return objects of type fixest, and I don't love it, but also don't really know to allow lpdid to return an object of type Feols
The likely easiest "new" did methods to implement would be the Sun-Abraham method & the Wooldridge extended two-way fixed effects estimator (which is nice because it also works for Poisson Regression). The CS method is actually already implemented in this Python package by @bernardodionisi.
In general I tried to do a good job to provide a general abstract DID class, but would also be happy about feedback on this =)

Wenzhi-Ding commented 4 months ago

This is super informative! I will also think about these issues you mentioned. I also agree that integrating all together makes it easier for researchers to find (a one-stop solution).

Abstracting a standard DID class will be super cool and influential. In that way, researchers can quickly verify their results across different models. I am still catching the progress of this literature, so I may not be able to contribute code quickly. But if there is any related discussion on this topic, please do notify me. I am more than willing to engage in the discussion.

py-econometrics / pyfixest

`lpdid`: compute pre-treatment `ATT` #268