h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Apache License 2.0
6.82k stars 2k forks source link

Feature Request:uplift trees #11818

Closed exalate-issue-sync[bot] closed 1 year ago

exalate-issue-sync[bot] commented 1 year ago

Request for new split criterion available for trees, and therefore for Distributed Random Forests and the Gradient Boosting Machine,

The split criterion has been originally proposed (uplift trees) by Guelman et al. (2013) and Rzepakowski & Jaroszewicz (2012) and also concisely described by Gutierrez & Gerardy (2017, http://proceedings.mlr.press/v67/gutierrez17a/gutierrez17a.pdf) at Equations 12 and 14. It is currently available in R in the upliftRF package (a package for a similar algorithm by Athey&Imbens is called causalTree).

In a few words, instead of learning splits based on Gini / information gain on the outcome P(Y) as in traditional decision trees, an uplift tree learn splits based on information gain on the difference of outcomes on two groups of users (test T and control C). Therefore, in addition to features X and outcome Y, an uplift tree takes as a input also a 'treatment' W=[T,C] used in the learning of the splits.

exalate-issue-sync[bot] commented 1 year ago

Neema Mashayekhi commented: Related Support Ticket requesting uplift: [https://support.h2o.ai/a/tickets/90043|https://support.h2o.ai/a/tickets/90043]

Reference: R’s uplift RF implementation: [https://www.rdocumentation.org/packages/uplift/versions/0.3.5/topics/upliftRF|https://www.rdocumentation.org/packages/uplift/versions/0.3.5/topics/upliftRF] (it implements Random Forests with split criteria designed for binary uplift modeling tasks)

exalate-issue-sync[bot] commented 1 year ago

Neema Mashayekhi commented: Recent support ticket requesting uplift split criterion for decision tree-based models:[https://support.h2o.ai/a/tickets/97307|https://support.h2o.ai/a/tickets/97307]

Attached are reference articles:

[^Radcliffe NJ, Surry PD 2011 - Real-World Uplift Modelling with Significance-Based Uplift Trees.pdf] [^Rzepakowski, Jaroszewicz 2012 - Decision trees for uplift modeling with single and multiple treatments.pdf]

[^Gutierrez P, Gerardy JY 2016 - Causal Inference and Uplift Modeling A review of the literature.pdf]

Uber's implementation: [https://github.com/uber/causalml|https://github.com/uber/causalml|smart-link]

exalate-issue-sync[bot] commented 1 year ago

Grigorios Fousas commented: I am very keen to help with this if you need help! I am very interested in Uplift modelling.

I have run several projects with uplift modelling and I have done my MSc dissertation on uplift modelling.

Also, N Radcliffe is a mentor and friend of mine (having a beer every now and then in Edinburgh) and P Surry an old colleague.

I have placed some more info here: [https://github.com/h2oai/dai-domain-solution-recipes/tree/master/uplift|https://github.com/h2oai/dai-domain-solution-recipes/tree/master/uplift|smart-link] , and I was planning to work on it more when I would have time.

Essentially, uplift modelling needs two things which makes it different from the traditional classification|regrassion modelling cases:

The capability to consume a control flag, which indicates if someone is in a control or treated group.

A different split criterion. In addition, to what it is mentioned above, I would suggest Qini a slit criterion, which the equivalent of Gini for Uplift modelling cases. This is described in the above Real-World Uplift Modelling with Significance-Based Uplift Trees paper.

exalate-issue-sync[bot] commented 1 year ago

Neema Mashayekhi commented: Greg, this is a bit complex to solve on the H2O side. We plan to start it in Q4 and will reach out to you once we start. Thanks for the help!

exalate-issue-sync[bot] commented 1 year ago

Grigorios Fousas commented: Thanks for the update [~accountid:5dc4f5bbb6e6b50c58af0624] !

I am uploading part on my dissertation with some theory on Uplift modelling and then how Portrait Miner, a software that is now almost dead, was approaching the Uplift modelling task. Portrait Miner is a child of Radcliffe and Surry and I can maybe dig it up from my old files and show you how it works if you are interested.

[^Dissertation chapters 2-3.pdf]

exalate-issue-sync[bot] commented 1 year ago

Grigorios Fousas commented: Another great resource!

[https://github.com/uber/causalml|https://github.com/uber/causalml|smart-link] The most complete I have seen so far out there.

Thanks to [~accountid:5b8d534896cb052b5f659f47] who found it.

exalate-issue-sync[bot] commented 1 year ago

Juan Telleria commented: Note that [Booking.com|http://Booking.com] has already released an uplift modeling Python Package, based on H2O-3 for performance:


exalate-issue-sync[bot] commented 1 year ago

Juan Telleria commented: The Python H2O-3 code can be found here: [https://github.com/bookingcom/upliftml/blob/main/upliftml/models/h2o.py|https://github.com/bookingcom/upliftml/blob/main/upliftml/models/h2o.py|smart-link]

exalate-issue-sync[bot] commented 1 year ago

Veronika Maurerová commented: Uplift trees implemented via DRF algorithm, currently only for binomial classification and one treatment group. From metrics, the AUUC is available now.

More features will be implemented soon (more metrics, grid search, early stopping, etc.).

hasithjp commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-4940 Assignee: Veronika Maurerová Reporter: Nidhi Mehta State: Resolved Fix Version: Attachments: Available (Count: 4) Development PRs: Available

Linked PRs from JIRA

https://github.com/h2oai/h2o-3/pull/5546 https://github.com/h2oai/h2o-3/pull/5547 https://github.com/h2oai/h2o-3/pull/5565 https://github.com/h2oai/h2o-3/pull/5576 https://github.com/h2oai/h2o-3/pull/5170 https://github.com/h2oai/h2o-3/pull/5224 https://github.com/h2oai/h2o-3/pull/5918 https://github.com/h2oai/h2o-3/pull/5919 https://github.com/h2oai/h2o-3/pull/5927 https://github.com/h2oai/h2o-3/pull/5968 https://github.com/h2oai/h2o-3/pull/5620 https://github.com/h2oai/h2o-3/pull/5624 https://github.com/h2oai/h2o-3/pull/5681

Attachments From Jira

Attachment Name: Dissertation chapters 2-3.pdf Attached By: Grigorios Fousas File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-4940/Dissertation chapters 2-3.pdf

Attachment Name: Gutierrez P, Gerardy JY 2016 - Causal Inference and Uplift Modeling A review of the literature.pdf Attached By: Neema Mashayekhi File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-4940/Gutierrez P, Gerardy JY 2016 - Causal Inference and Uplift Modeling A review of the literature.pdf

Attachment Name: Radcliffe NJ, Surry PD 2011 - Real-World Uplift Modelling with Significance-Based Uplift Trees.pdf Attached By: Neema Mashayekhi File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-4940/Radcliffe NJ, Surry PD 2011 - Real-World Uplift Modelling with Significance-Based Uplift Trees.pdf

Attachment Name: Rzepakowski, Jaroszewicz 2012 - Decision trees for uplift modeling with single and multiple treatments.pdf Attached By: Neema Mashayekhi File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-4940/Rzepakowski, Jaroszewicz 2012 - Decision trees for uplift modeling with single and multiple treatments.pdf