DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.
Hi all, I'm new to the GitHub community and Python. I have two questions regarding the dowhy package:
Output Replicability Issue:
The estimate from dowhy appears to vary slightly each time I run it. I've tried setting random seeds without success. How can I ensure that the output is consistent and replicable?
Discrepancy with Statsmodels:
I've noticed significant differences between the treatment effect estimates (ATE, ATT, ATC) obtained from dowhy and those generated by statsmodels. Both methods use the same identification approach (propensity score matching & probit model). Can anyone provide guidance on resolving this discrepancy?
Expected behavior
DoWhy output should be replicable (i.e. exactly the same value every time)
The output should be very similar to statsmodels/Stata/R outputs.
Version information:
DoWhy version [0.11.1]
Additional context
My codes below for reference:
# Dataset loading
cur_dir = os.path.abspath(os.path.dirname(res_st.__file__))
file_name = 'cataneo2.csv'
file_path = os.path.join(cur_dir, file_name)
dta_cat = pd.read_csv(file_path)
methods = ['ra', 'ipw', 'aipw', 'aipw_wls', 'ipw_ra']
methods_st = [
("ra", res_st.results_ra),
("ipw", res_st.results_ipw),
("aipw", res_st.results_aipw),
("aipw_wls", res_st.results_aipw_wls),
("ipw_ra", res_st.results_ipwra),
]
pd.set_option('display.width', 500)
dta_cat.head()
# Statsmodels approach
# Treatment selection model: probit model
formula = 'mbsmoke_ ~ mmarried_ + mage + mage2 + fbaby_ + medu'
res_probit = Probit.from_formula(formula, dta_cat).fit() # Estimate the probability of smoking
# Outcome model: OLS model
formula_outcome = 'bweight ~ prenatal1_ + mmarried_ + mage + fbaby_'
mod = OLS.from_formula(formula_outcome, dta_cat)
# Treatment indicator variable
tind = np.asarray(dta_cat['mbsmoke_']) # Converts the treatment indicator variable (mbsmoke_) from the DataFrame to a NumPy array.
teff = TreatmentEffect(mod, tind, results_select=res_probit)
res = teff.ipw() # Compute POM and ATE using inverse probability weighting
print("Results from Statsmodels (ATE):", res)
teff.ipw(effect_group=1) # Average Treatment Effect on Treated
teff.ipw(effect_group=0) # ATE on untreated
# DoWhy approach
np.random.seed(42)
model = CausalModel(
data=dta_cat,
treatment='mbsmoke_',
outcome='bweight',
common_causes=['mmarried_', 'mage', 'mage2', 'fbaby_', 'medu', 'prenatal1_']
)
identified_estimand = model.identify_effect()
print("Identified Estimand from DoWhy:", identified_estimand)
ATE = model.estimate_effect(
identified_estimand,
method_name='backdoor.propensity_score_matching',
method_params={
'propensity_score_model': LogisticRegression(), # Ensure it uses Logistic Regression
'matching_algorithm': 'nearest_neighbor', # Ensure similar matching algorithm
'n_neighbors': 1 # Default is 1-to-1 matching, similar to pairwise matching
}
)
print("ATE from DoWhy:", ATE.value)
ATT = model.estimate_effect(
identified_estimand,
method_name='backdoor.propensity_score_matching',
method_params={
'propensity_score_model': LogisticRegression(),
'matching_algorithm': 'nearest_neighbor',
'n_neighbors': 1
},
target_units='att' # Focus on treated units
)
print("ATT from DoWhy:", ATT.value)
ATC = model.estimate_effect(
identified_estimand,
method_name='backdoor.propensity_score_matching',
method_params={
'propensity_score_model': LogisticRegression(),
'matching_algorithm': 'nearest_neighbor',
'n_neighbors': 1
},
target_units='atc' # Focus on untreated units
)
print("ATU from DoWhy:", ATC.value)
refutation = model.refute_estimate(
identified_estimand,
ATE,
method_name='placebo_treatment_refuter'
)
print("Refutation result from DoWhy:", refutation)
Ask your question
Hi all, I'm new to the GitHub community and Python. I have two questions regarding the dowhy package:
Output Replicability Issue: The estimate from dowhy appears to vary slightly each time I run it. I've tried setting random seeds without success. How can I ensure that the output is consistent and replicable?
Discrepancy with Statsmodels: I've noticed significant differences between the treatment effect estimates (ATE, ATT, ATC) obtained from dowhy and those generated by statsmodels. Both methods use the same identification approach (propensity score matching & probit model). Can anyone provide guidance on resolving this discrepancy?
Expected behavior
Version information:
Additional context
My codes below for reference: