uber / causalml

Uplift modeling and causal inference with machine learning algorithms
Other
4.9k stars 762 forks source link

Adding another tree-based approach: The delta-delta-p criterium #300

Closed jroessler closed 2 years ago

jroessler commented 3 years ago

Is your feature request related to a problem? Please describe. All of your tree-based uplift approaches are based on either the work by Rzepakowski and Jaroszewicz (2012) or the work by Zhao et al. (2017). Yet, there is another tree-based approach which was introduced by Hansotia & Rukstales in 2002. The authors used a so called delta-delta-p criterium for splitting and training the tree. In short: At each split, they calculate the difference in response rate between treatment and control group in each leaf. Subsequently, they calculate the difference between the differences (that's why they call it delta-delta-p) in the leafs. The split (in particular the variable and its corresponding value) with the largest delta-delta-p value is then selected. I think that the method is very intuitiv and easy to understand and it cannot only serve well as a baseline but it can also outperform other approaches. Thus, it should definitely be a part of the uplift-modeling "toolbox"

Describe the solution you'd like I looked into your code and it seems very easy to integrate the method by Hansotia & Rukstales thanks to your well organized code. I have already implemented the code and would be happy to share it with you via a pull request.

Additional context Hansotia & Rukstales: Incremental value modeling (2002)

t-tte commented 3 years ago

This sounds good! Feel free to submit the PR.

jroessler commented 3 years ago

@t-tte: Sorry for my late response!

I forgot to mention that the DDP approach works only for binary cases in which customers receive either a treatment or not. I guess it is still relevant but I'll mention it in the documents and I'll probably add a warning if the user wants to apply the DDP approach on a dataset with multiple treatments.