Closed exalate-issue-sync[bot] closed 1 year ago
Patrick Hall commented: Is there anyway to abstract this away from h2o3 so that it could be used easily from h2o4, secure etc.?
Are we planning to have POJOs in h2o4 and can we extract rules from the POJO files themselves?
Patrick Hall commented: - Think of reading from MOJO instead.
Patrick Hall commented: First rough cuts here: https://github.com/h2oai/mli/blob/master/notebooks/dt_surrogate.ipynb
Patrick Hall commented: Basically completed in H2O Driverless AI project. Closing.
Patrick Hall commented: Available in Driverless AI.
Gregory Kanevsky commented: reopening.
proposed rules generated as sequence of plain language rules for both classification and regression models like this:
{noformat}Tree 01: Rule 01: IF PAY_0 < 1.5 OR NULL AND PAY_2 < 0.5 AND BILL_AMT > 5533 THEN AVERAGE VALUE OF DEFAULT PAYMENT NEXT MONTH IS 0.123 Rule 02: IF PAY_0 >= 1.5 AND PAY_2 >= 0.5 OR NULL AND BILL_AMT <= 5533 OR NULL THEN AVERAGE VALUE OF DEFAULT PAYMENT NEXT MONTH IS 0.321 ...{noformat}
Sort rules in either:
option 1: ascending order of its leaf values
option 2: traversing tree leaves order
Gregory Kanevsky commented: the problem with plain English rules is that it’s not flexible - it does report tree rules but if a user wants something a bit different then it would be hard to use. We should start with plain english keeping options of extending its functionality in future releases
Patrick Hall commented: This is a common representation of what we are asking for, highlighted in red.
!Screen Shot 2019-12-02 at 3.53.21 PM.png|width=746,height=1230!
It’s also available in XGBoost by {{model.get_booster().dump_model(...)}}
I’m fine with presenting the rules as a sorted table as well, but the rule list is a fairly common feature that people expect.
Zuzana Olajcová commented: Resolved in [#4771|https://github.com/h2oai/h2o-3/pull/4771].
Nikhil sharma commented: Thank you for sharing such good information. Very informative and effective post. [DevOps Course Online|https://www.igmguru.com/cloud-computing/devops-certification-training/] is a set of practices that combines software development and IT operations.
JIRA Issue Migration Info
Jira Issue: PUBDEV-4007 Assignee: Zuzana Olajcová Reporter: Erin LeDell State: Resolved Fix Version: N/A Attachments: Available (Count: 1) Development PRs: Available
Linked PRs from JIRA
https://github.com/h2oai/h2o-3/pull/4771 https://github.com/h2oai/h2o-3/pull/5617
Attachments From Jira
Attachment Name: Screen Shot 2019-12-02 at 3.53.21 PM.png Attached By: Patrick Hall File Link:https://h2o-3-jira-github-migration.s3.amazonaws.com/PUBDEV-4007/Screen Shot 2019-12-02 at 3.53.21 PM.png
Two things:
I think there is a tool for this in Python -- I saw [this|http://stackoverflow.com/questions/20224526/how-to-extract-the-decision-rules-from-scikit-learn-decision-tree], but I recall there being a separate Python module that does this.
There are also several tools that do this in [R|http://stackoverflow.com/questions/29618490/get-decision-tree-rule-path-pattern-for-every-row-of-predicted-dataset-for-rpart].
Related: https://0xdata.atlassian.net/browse/PUBDEV-4324