py-why / dowhy

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.
https://www.pywhy.org/dowhy
MIT License
7.08k stars 929 forks source link

Setting the seed for estimate effect methods #418

Open misclassified opened 2 years ago

misclassified commented 2 years ago

Hi,

how can I set the seed for estimate effect methods? Even when setting an environment seed with NumPy my results keep changing slightly when running the same code at different times. This is rather inconvenient for reproducibility and trust in the analysis. For a method like propensity_score_stratification , this is probably due to the propensity model step which introduces a degree of randomness.

estimate = model.estimate_effect(identified_estimand, 
                                     method_name="backdoor.propensity_score_stratification",
                                    target_units="att")
# Return 65.32 in iteration 1
# Return 63.46 in iteration 2
# Return 64.67 in iteration 3

Can't find a way to pass the seed as a kwargs.

Thanks Giovanni

zahs123 commented 2 years ago

Hi, i was able to reproduce when my data was in a certain order. For some reason my SQL query was returning my data in a random fashion/order depsite ordering by the unique id. It seems when i change the order of the exact same data however i get wildly different results. e.g. say i have my dataframe in pandas as 'df', i shuffle this then my estimate changed by 1000%. I find it a little disconcerting the difference in result. Is this to be expected ? (this was using propensity score matching)

Why does the order of the data (for the same data) give such wildly different results? I cannot see from the code where this may impact the final output