py-why / EconML

ALICE (Automated Learning and Intelligence for Causation and Economics) is a Microsoft Research project aimed at applying Artificial Intelligence concepts to economic decision making. One of its goals is to build a toolkit that combines state-of-the-art machine learning techniques with econometrics in order to bring automation to complex causal inference problems. To date, the ALICE Python SDK (econml) implements orthogonal machine learning algorithms such as the double machine learning work of Chernozhukov et al. This toolkit is designed to measure the causal effect of some treatment variable(s) t on an outcome variable y, controlling for a set of features x.
https://www.microsoft.com/en-us/research/project/alice/
Other
3.85k stars 719 forks source link

Recast contextual bandit problem as causal inference #266

Open JunhaoWang opened 4 years ago

JunhaoWang commented 4 years ago

I have a contextual bandit problem with (S - state, A - action, R - reward) where S is high-dimensional vector, A is continuous value, R is continuous value, how do I learn optimal mapping function from state to action to maximize reward? It seems the current package can only estimate continuous treatment effect given a control treatment, but O don't have a control treatment. Furthermore, all current estimators don't scale to large data and high dimensional (millions of samples in train / test set, thousands of dimensions). Is there a way to make it more scalable?

kbattocchi commented 4 years ago

Sorry for the late response; for your question about scalability, it's also feedback that we've received from a few other people so it's something we're starting to think about, but we don't have a short-term solution; even in the longer term, we don't have ambitions to scale to distributed settings, but we would like to make it possible to efficiently process as much data as you can load onto a single machine.

For the first question, maybe you could elaborate on your goals but I can't immediately see a good way to use our package in a traditional contextual bandit setting. We have estimators like the NonParamDMLCateEstimator that use continuous treatments and can learn nonparametric conditional treatment effects. It's true that the estimated effect is expressed as the difference in R when moving from A=0 to some other value a, but you can work around this by adding an prediction of the reward at A=0 (which you could get from the fit first-stage y models, for instance). But I'm not sure that the requirements for the data generating process for our DML methods to work correctly will be satisfied in a contextual bandit setting, because unless your policy is completely random, your choices of which treatments to apply won't depend solely on the state S but also on previously seen rewards R.