Open s3alfisc opened 5 months ago
That would be fantastic @Wenzhi-Ding! All contributions in this area would be very much appreciated. I'd be open to either integrate new estimators into pyfixest
and you starting a new repo (if you do, one option would be to simply "fork" the did
module into a standalone repo and build upon it?). My suggestion is to start within pyfixest
(as the module will be easier to find for users) and then we could decide if it makes sense to have a standalone project in the future?
On the did
estimators that are implemented, there are a few things that I think would benefit from a second look / a caring hand. I'm mostly listing them here, not necessarily in order of importance :D
lpdid
function would be able to compute pre-treatment ATTs as described above =D DID
class. For example, at the moment, I infer the treatment timing based on a "time" variable (tname" and a treatment assignment year variable "gname"). For this to work, I have to ask users to provide integers of the YYYYMMDD format to be able to compute treatment status, which is likely error prone? lpdid
produces the following coefficient names %load_ext autoreload
%autoreload 2
import pandas as pd
from pyfixest.did.estimation import lpdid, event_study, did2s
url = "https://raw.githubusercontent.com/s3alfisc/pyfixest/master/pyfixest/did/data/df_het.csv"
df_het = pd.read_csv(url)
fit = lpdid(
df_het,
yname="dep_var",
idname="unit",
tname="year",
gname="g",
vcov={"CRV1": "state"},
pre_window=-20,
post_window=20,
att=False
)
fit.tidy().index
#Index(['time_to_treatment::-20', 'time_to_treatment::-19',
# 'time_to_treatment::-18', 'time_to_treatment::-17',
# 'time_to_treatment::-16', 'time_to_treatment::-15',
# 'time_to_treatment::-14', 'time_to_treatment::-13',
# etc
while did2s
returns
fit = did2s(
df_het,
yname="dep_var",
first_stage="~ 0 | unit + year",
second_stage="~i(rel_year)",
treatment="treat",
cluster="state",
i_ref1=[-1.0, np.inf],
)
fit._coefnames
#['C(rel_year,contr.treatment(base=-1.0))[T.-20.0]',
# 'C(rel_year,contr.treatment(base=-1.0))[T.-19.0]',
# 'C(rel_year,contr.treatment(base=-1.0))[T.-18.0]',
lpdid
returns a pd.DataFrame
, while event_study()
and did2s()
return objects of type fixest
, and I don't love it, but also don't really know to allow lpdid
to return an object of type Feols
DID
class, but would also be happy about feedback on this =) This is super informative! I will also think about these issues you mentioned. I also agree that integrating all together makes it easier for researchers to find (a one-stop solution).
Abstracting a standard DID class will be super cool and influential. In that way, researchers can quickly verify their results across different models. I am still catching the progress of this literature, so I may not be able to contribute code quickly. But if there is any related discussion on this topic, please do notify me. I am more than willing to engage in the discussion.
I am exploring these new DID methods recently. Maybe I can take this, as well as integrating other prevalent new DID methods into
pyfixest
like Callaway and Sant'Anna (2021) and Sun and Abraham (2021), if you think it is good to have these functions insidepyfixest
. Or I can also write independent packages that callspyfixest
. Not sure from a design perspective which approach is better.