h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.92k stars 2k forks source link

Decision tree surrogates w/ plain language rules #10900

Closed exalate-issue-sync[bot] closed 1 year ago

exalate-issue-sync[bot] commented 1 year ago

Two things:

I think there is a tool for this in Python -- I saw [this|http://stackoverflow.com/questions/20224526/how-to-extract-the-decision-rules-from-scikit-learn-decision-tree], but I recall there being a separate Python module that does this.

There are also several tools that do this in [R|http://stackoverflow.com/questions/29618490/get-decision-tree-rule-path-pattern-for-every-row-of-predicted-dataset-for-rpart].

Related: https://0xdata.atlassian.net/browse/PUBDEV-4324

exalate-issue-sync[bot] commented 1 year ago

Patrick Hall commented: Is there anyway to abstract this away from h2o3 so that it could be used easily from h2o4, secure etc.?

Are we planning to have POJOs in h2o4 and can we extract rules from the POJO files themselves?

exalate-issue-sync[bot] commented 1 year ago

Patrick Hall commented: - Think of reading from MOJO instead.

exalate-issue-sync[bot] commented 1 year ago

Patrick Hall commented: First rough cuts here: https://github.com/h2oai/mli/blob/master/notebooks/dt_surrogate.ipynb

exalate-issue-sync[bot] commented 1 year ago

Patrick Hall commented: Basically completed in H2O Driverless AI project. Closing.

exalate-issue-sync[bot] commented 1 year ago

Patrick Hall commented: Available in Driverless AI.

exalate-issue-sync[bot] commented 1 year ago

Gregory Kanevsky commented: reopening.

proposed rules generated as sequence of plain language rules for both classification and regression models like this:

{noformat}Tree 01: Rule 01: IF PAY_0 < 1.5 OR NULL AND PAY_2 < 0.5 AND BILL_AMT > 5533 THEN AVERAGE VALUE OF DEFAULT PAYMENT NEXT MONTH IS 0.123 Rule 02: IF PAY_0 >= 1.5 AND PAY_2 >= 0.5 OR NULL AND BILL_AMT <= 5533 OR NULL THEN AVERAGE VALUE OF DEFAULT PAYMENT NEXT MONTH IS 0.321 ...{noformat}

Sort rules in either:

option 1: ascending order of its leaf values

option 2: traversing tree leaves order

exalate-issue-sync[bot] commented 1 year ago

Gregory Kanevsky commented: the problem with plain English rules is that it’s not flexible - it does report tree rules but if a user wants something a bit different then it would be hard to use. We should start with plain english keeping options of extending its functionality in future releases

exalate-issue-sync[bot] commented 1 year ago

Patrick Hall commented: This is a common representation of what we are asking for, highlighted in red.

!Screen Shot 2019-12-02 at 3.53.21 PM.png|width=746,height=1230!

It’s also available in XGBoost by {{model.get_booster().dump_model(...)}}

I’m fine with presenting the rules as a sorted table as well, but the rule list is a fairly common feature that people expect.

exalate-issue-sync[bot] commented 1 year ago

Zuzana Olajcová commented: Resolved in [#4771|https://github.com/h2oai/h2o-3/pull/4771].

exalate-issue-sync[bot] commented 1 year ago

Nikhil sharma commented: Thank you for sharing such good information. Very informative and effective post. [DevOps Course Online|https://www.igmguru.com/cloud-computing/devops-certification-training/] is a set of practices that combines software development and IT operations.

h2o-ops commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-4007 Assignee: Zuzana Olajcová Reporter: Erin LeDell State: Resolved Fix Version: N/A Attachments: Available (Count: 1) Development PRs: Available

Linked PRs from JIRA

https://github.com/h2oai/h2o-3/pull/4771 https://github.com/h2oai/h2o-3/pull/5617

Attachments From Jira

Attachment Name: Screen Shot 2019-12-02 at 3.53.21 PM.png Attached By: Patrick Hall File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-4007/Screen Shot 2019-12-02 at 3.53.21 PM.png