uber / causalml

Uplift modeling and causal inference with machine learning algorithms
Other
5.12k stars 783 forks source link

Question on uplift curve with multiple treatments #234

Closed deeplaunch closed 4 years ago

deeplaunch commented 4 years ago

In the tutorial uplift_trees_with_synthetic_data, the author compares the treatments which happen to have the highest predicted CATE (let's call it "synthetic treatment") with the control group.

Is there some selection problem here? Specifically, by definition this will skip the subset for which none of the treatment is better than control (this group is not in "synthetic treatment", but in "control").

If this is the case, should it be added to the "synthetic treatment" group?

On a related note, what the uplift curve doesn't seem to tell us is how much gain we can have by having the option to assign different treatment to our population (vs. the single treatment with the largest ATE). Sometimes even if our AUUC does not look good, such differentiated treatment can still be helpful particularly when we don't have a budget constraint. Is this understanding correct?

Sorry if my wording is too ambiguous. Do let me know if more clarification is needed.

Thanks!

jeongyoonlee commented 4 years ago

Thanks for the question, @deeplaunch. I added the author of the notebook, @t-tte so that he can answer to your question when he's available, but, meanwhile, let me try to answer your question.

Is there some selection problem here? Specifically, by definition this will skip the subset for which none of the treatment is better than control (this group is not in "synthetic treatment", but in "control").

If this is the case, should it be added to the "synthetic treatment" group?

For those whom no treatment is better than control for, the recommended treatment by the uplift model is indeed control. Therefore, it's correct to assign them to control instead of synthetic treatment.

On a related note, what the uplift curve doesn't seem to tell us is how much gain we can have by having the option to assign different treatment to our population (vs. the single treatment with the largest ATE). Sometimes even if our AUUC does not look good, such differentiated treatment can still be helpful particularly when we don't have a budget constraint. Is this understanding correct?

Actually, that's exactly what "synthetic treatment" is doing. With the synthetic treatment, we selected the treatment with the highest CATE estimate for each individual. Therefore, the uplift curve in the notebook shows the gain we'd get by selecting the best treatment for each individual.

Sorry if my wording is too ambiguous. Do let me know if more clarification is needed.

Thanks!

No worries. Please let me know if you have any more questions. Thanks for your question.

t-tte commented 4 years ago

Thanks for the very good question @deeplaunch! @jeongyoonlee already provided some great clarifications above. The only thing I'd like to add is that there are multiple ways in which you could construct the contrast against which you evaluate the performance of an uplift model. In more recent work, I've preferred the approach set out in Kapelner et al 2014 in which the treatment recommended by the uplift model is compared against alternative allocation strategies in an unseen sample. An example of this approach can be found in the unit selection notebook. There is also a recent paper that has an overview of the various model comparison approaches that have been proposed.

deeplaunch commented 4 years ago

Thank you both for such detailed answers @jeongyoonlee and @t-tte -they're very helpful.

For those whom no treatment is better than control for, the recommended treatment by the uplift model is indeed control. Therefore, it's correct to assign them to control instead of synthetic treatment.

Definitely agree from an implementation standpoint. However, from an evaluation standpoint, the uplift in this subset should be 0, which should be counted towards overall uplift (calculated as average). If so, one way to do this is to include this group in both "treatment" and "control" group for evaluation.

Actually, that's exactly what "synthetic treatment" is doing. With the synthetic treatment, we selected the treatment with the highest CATE estimate for each individual. Therefore, the uplift curve in the notebook shows the gain we'd get by selecting the best treatment for each individual.

Completely agree on the uplift curve. However, this can be further decomposed into:

  1. Assign Individual Treatment vs. Assign the Treatment with the Highest Average Treatment Effect.
  2. Assign the Treatment with the highest Average Treatment Effect vs. Random assignment.

From a practical standpoint, would you think 1 is more meaningful? In reality, we rarely do random assignment, if we know which one has the highest ATE. If so, how should we evaluate it?

Will check out the papers too! cc'ed @mavpanos