py-why / dowhy

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.
https://www.pywhy.org/dowhy
MIT License
7.14k stars 935 forks source link

Different ATE estimates on doubleML from EconML v/s Dowhy #1278

Open ankur-tutlani opened 2 weeks ago

ankur-tutlani commented 2 weeks ago

I tried using same dataset on both EconML and Dowhy functions. I am getting different ATE estimates. There is a difference of about 10-20% on average, sometimes more between the ATE estimates from both. All the variables in dataset are continuous including treatment. I have kept parameters consistent with both the frameworks along with random seed. What could explain this divergence?

EconML code:

import numpy as np
import pandas as pd
from econml.dml import DML
import xgboost as xgb
from econml.sklearn_extensions.linear_model import StatsModelsRLM

# Define the model
model_y = xgb.XGBRegressor(random_state=578,max_depth=3,n_estimators=100)
model_t = xgb.XGBRegressor(random_state=578,max_depth=3,n_estimators=100)
model_final=StatsModelsRLM(fit_intercept=True)

# Instantiate the DoubleML model
dml = DML(model_y=model_y, model_t=model_t, model_final=model_final,discrete_treatment=False,random_state=587921,cv=3)

# Fit the model
dml.fit(Y=data1['Y'], T=data1['X'], X=data1['Z'])
dml.ate(X=data1['Z'],T0=19.32,T1=19.13)

DoWhy code:

model=CausalModel(
    data = data1,
    treatment='X',
    outcome='Y',
    common_causes = 'Z'
    )

identified_estimand = model.identify_effect(proceed_when_unidentifiable=True)

causal_estimate = model.estimate_effect(identified_estimand,
          method_name="backdoor.econml.dml.DML",
          confidence_intervals=False,
          control_value = 19.32,
          treatment_value = 19.13,
                                        method_params={
                      "init_params":{
                        'model_y':xgb.XGBRegressor(random_state=578,max_depth=3,n_estimators=100),
                                    'model_t': xgb.XGBRegressor(random_state=578,max_depth=3,n_estimators=100),
                                    'model_final':StatsModelsRLM(fit_intercept=True), 
                                    'discrete_treatment' : False,
                                    'random_state':587921,
                                    'cv':3
                      },
                      "fit_params":{},
                      'num_null_simulations':399,'num_simulations':399})

print(causal_estimate.value)

Version information:

drawlinson commented 2 days ago

The order of samples drawn from the (potentially multiple) PRNGs could be slightly different between the two versions of the code. Even if the algorithm is conceptually identical, there would then be differences in output.

I suggest you perform the experiment 100 times with each library and plot the resulting estimate distributions. If the estimate distributions are not significantly different, then there is no bug and your estimate simply has high variance.