Open exalate-issue-sync[bot] opened 1 year ago
Zuzana Olajcová commented: Hi [~accountid:557058:f0137791-c6cb-47bd-bcce-fc81ad4cfefa] , I am not able to reproduce this, can you please provide reproducible example? Thanks!
Megan Kurka commented: I don’t have the example I used anymore but still do see that it is overfitting on categorical.
Zuzana Olajcová commented: 👍 ok, I will add option to turn on one hot encoding in this JIRA. To lower the overfitting, it could also help to lower the max_num_rules.
Zuzana Olajcová commented: next steps from call with [~accountid:557058:f0137791-c6cb-47bd-bcce-fc81ad4cfefa] :
Zuzana Olajcová commented: moving to open as blocked by [https://h2oai.atlassian.net/browse/PUBDEV-8133|https://h2oai.atlassian.net/browse/PUBDEV-8133|smart-link]
JIRA Issue Details
Jira Issue: PUBDEV-8011 Assignee: Zuzana Olajcová Reporter: Megan Kurka State: Open Fix Version: N/A Attachments: N/A Development PRs: N/A
Rule looks like this: (AirTime in {104.0, 105.0, 110.0, 111.0, 112.0, 113.0, 114.0, 120.0, 121.0, 124.0, 125.0, 126.0, 127.0, 128.0, 130.0, 131.0, 132.0, 133.0, 136.0, 138.0, 139.0, 14.0, 140.0, 142.0, 146.0, 147.0, 148.0, 15.0, 150.0, 151.0
even though AirTime is an integer column.
Using allyears2k data for this.
It also seems like the rules may be overfitting to categoricals. It would be nice to have the option to turn on one hot encoding for this. This would also help simplify the rules.