py-why / dowhy

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.
https://www.pywhy.org/dowhy
MIT License
7.16k stars 936 forks source link

dowhy replication issue and output discrepancy with statsmodels #1227

Closed dododobetter closed 3 months ago

dododobetter commented 4 months ago

Ask your question

Hi all, I'm new to the GitHub community and Python. I have two questions regarding the dowhy package:

  1. Output Replicability Issue: The estimate from dowhy appears to vary slightly each time I run it. I've tried setting random seeds without success. How can I ensure that the output is consistent and replicable?

  2. Discrepancy with Statsmodels: I've noticed significant differences between the treatment effect estimates (ATE, ATT, ATC) obtained from dowhy and those generated by statsmodels. Both methods use the same identification approach (propensity score matching & probit model). Can anyone provide guidance on resolving this discrepancy?

Expected behavior

  1. DoWhy output should be replicable (i.e. exactly the same value every time)
  2. The output should be very similar to statsmodels/Stata/R outputs.

Version information:

Additional context

My codes below for reference:

# Dataset loading

cur_dir = os.path.abspath(os.path.dirname(res_st.__file__))
file_name = 'cataneo2.csv'
file_path = os.path.join(cur_dir, file_name)
dta_cat = pd.read_csv(file_path)
methods = ['ra', 'ipw', 'aipw', 'aipw_wls', 'ipw_ra']
methods_st = [
    ("ra", res_st.results_ra),
    ("ipw", res_st.results_ipw),
    ("aipw", res_st.results_aipw),
    ("aipw_wls", res_st.results_aipw_wls),
    ("ipw_ra", res_st.results_ipwra),
]
pd.set_option('display.width', 500)
dta_cat.head()

# Statsmodels approach

# Treatment selection model: probit model
formula = 'mbsmoke_ ~ mmarried_ + mage + mage2 + fbaby_ + medu'
res_probit = Probit.from_formula(formula, dta_cat).fit()  # Estimate the probability of smoking

# Outcome model: OLS model
formula_outcome = 'bweight ~ prenatal1_ + mmarried_ + mage + fbaby_'
mod = OLS.from_formula(formula_outcome, dta_cat)

# Treatment indicator variable
tind = np.asarray(dta_cat['mbsmoke_'])  # Converts the treatment indicator variable (mbsmoke_) from the DataFrame to a NumPy array.
teff = TreatmentEffect(mod, tind, results_select=res_probit)

res = teff.ipw()  # Compute POM and ATE using inverse probability weighting
print("Results from Statsmodels (ATE):", res)

teff.ipw(effect_group=1)  # Average Treatment Effect on Treated
teff.ipw(effect_group=0)  # ATE on untreated

# DoWhy approach

np.random.seed(42)

model = CausalModel(
    data=dta_cat,
    treatment='mbsmoke_',
    outcome='bweight',
    common_causes=['mmarried_', 'mage', 'mage2', 'fbaby_', 'medu', 'prenatal1_']
)

identified_estimand = model.identify_effect()
print("Identified Estimand from DoWhy:", identified_estimand)

ATE = model.estimate_effect(
    identified_estimand,
    method_name='backdoor.propensity_score_matching',
    method_params={
        'propensity_score_model': LogisticRegression(),  # Ensure it uses Logistic Regression
        'matching_algorithm': 'nearest_neighbor',  # Ensure similar matching algorithm
        'n_neighbors': 1  # Default is 1-to-1 matching, similar to pairwise matching
    }
)
print("ATE from DoWhy:", ATE.value)

ATT = model.estimate_effect(
    identified_estimand,
    method_name='backdoor.propensity_score_matching',
    method_params={
        'propensity_score_model': LogisticRegression(),
        'matching_algorithm': 'nearest_neighbor',
        'n_neighbors': 1
    },
    target_units='att'  # Focus on treated units
)
print("ATT from DoWhy:", ATT.value)

ATC = model.estimate_effect(
    identified_estimand,
    method_name='backdoor.propensity_score_matching',
    method_params={
        'propensity_score_model': LogisticRegression(),
        'matching_algorithm': 'nearest_neighbor',
        'n_neighbors': 1
    },
    target_units='atc'  # Focus on untreated units
)
print("ATU from DoWhy:", ATC.value)

refutation = model.refute_estimate(
    identified_estimand,
    ATE,
    method_name='placebo_treatment_refuter'
)
print("Refutation result from DoWhy:", refutation)
github-actions[bot] commented 4 months ago

This issue is stale because it has been open for 14 days with no activity.

github-actions[bot] commented 3 months ago

This issue was closed because it has been inactive for 7 days since being marked as stale.