py-why / EconML

ALICE (Automated Learning and Intelligence for Causation and Economics) is a Microsoft Research project aimed at applying Artificial Intelligence concepts to economic decision making. One of its goals is to build a toolkit that combines state-of-the-art machine learning techniques with econometrics in order to bring automation to complex causal inference problems. To date, the ALICE Python SDK (econml) implements orthogonal machine learning algorithms such as the double machine learning work of Chernozhukov et al. This toolkit is designed to measure the causal effect of some treatment variable(s) t on an outcome variable y, controlling for a set of features x.
https://www.microsoft.com/en-us/research/project/alice/
Other
3.81k stars 714 forks source link

Results changes after each run #245

Open Shafi2016 opened 4 years ago

Shafi2016 commented 4 years ago

Thanks for the very nice work!

I am running a basic model as the codes given below. I am getting different results when running these codes each time even though with random_state is fixed to 504

est = LinearDMLCateEstimator(model_y=RandomForestRegressor(),model_t=RandomForestRegressor(),n_splits= 2, random_state= 504) est.fit(Y, T, X, W,inference='statsmodels') te_pred=est.effect(X_test) te_pred_interval = est.const_marginal_effect_interval(X_test, alpha=0.05)

kbattocchi commented 4 years ago

Thanks for the bug report. I believe this behavior is by design, but I can see how it is confusing. The random_state argument to our estimators controls their internal use of randomness (in the case of the DML estimators, this involves how samples are split into folds for cross-fitting). If the submodels used are deterministic, then the output will be deterministic; however, if the submodels themselves use randomness then you will need to control that as well to get an overall deterministic result. In your case, this means that you need to also pass a fixed random_state argument to each RandomForestRegressor. Does that make sense?

Shafi2016 commented 4 years ago

Thank you so much!! It works fine by using random_state in each of the model_y,model_t and LinearDMLCateEstimator.