Open misclassified opened 2 years ago
Hi, i was able to reproduce when my data was in a certain order. For some reason my SQL query was returning my data in a random fashion/order depsite ordering by the unique id. It seems when i change the order of the exact same data however i get wildly different results. e.g. say i have my dataframe in pandas as 'df', i shuffle this then my estimate changed by 1000%. I find it a little disconcerting the difference in result. Is this to be expected ? (this was using propensity score matching)
Why does the order of the data (for the same data) give such wildly different results? I cannot see from the code where this may impact the final output
Hi,
how can I set the seed for estimate effect methods? Even when setting an environment seed with NumPy my results keep changing slightly when running the same code at different times. This is rather inconvenient for reproducibility and trust in the analysis. For a method like propensity_score_stratification , this is probably due to the propensity model step which introduces a degree of randomness.
Can't find a way to pass the seed as a kwargs.
Thanks Giovanni