py-why / EconML

ALICE (Automated Learning and Intelligence for Causation and Economics) is a Microsoft Research project aimed at applying Artificial Intelligence concepts to economic decision making. One of its goals is to build a toolkit that combines state-of-the-art machine learning techniques with econometrics in order to bring automation to complex causal inference problems. To date, the ALICE Python SDK (econml) implements orthogonal machine learning algorithms such as the double machine learning work of Chernozhukov et al. This toolkit is designed to measure the causal effect of some treatment variable(s) t on an outcome variable y, controlling for a set of features x.
https://www.microsoft.com/en-us/research/project/alice/
Other
3.86k stars 719 forks source link

hyperparameters tuning #481

Open yjun14 opened 3 years ago

yjun14 commented 3 years ago

In FAQ, you mention that hyperparameters can be tuned using all the data:

"How do I select the hyperparameters of the first stage models?

Alternatively, you can pick the best first stage models outside of the EconML framework and pass in the selected models to EconML. This can save on runtime and computational resources. Furthermore, it is statistically more stable since all data is being used for hyper-parameter tuning rather than a single fold inside of the DML algorithm (as long as the number of hyperparameter values that you are selecting over is not exponential in the number of samples, this approach is statistically valid)."

We are trying to understand the statistical validity of this approach. Can you please point us to the right papers?

Thank you.

vsyrgkanis commented 3 years ago

See discussion in #454 and references therein