h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.91k stars 2k forks source link

RuleFit treating integers as categorical #7637

Open exalate-issue-sync[bot] opened 1 year ago

exalate-issue-sync[bot] commented 1 year ago

Rule looks like this: (AirTime in {104.0, 105.0, 110.0, 111.0, 112.0, 113.0, 114.0, 120.0, 121.0, 124.0, 125.0, 126.0, 127.0, 128.0, 130.0, 131.0, 132.0, 133.0, 136.0, 138.0, 139.0, 14.0, 140.0, 142.0, 146.0, 147.0, 148.0, 15.0, 150.0, 151.0

even though AirTime is an integer column.

Using allyears2k data for this.

It also seems like the rules may be overfitting to categoricals. It would be nice to have the option to turn on one hot encoding for this. This would also help simplify the rules.

exalate-issue-sync[bot] commented 1 year ago

Zuzana Olajcová commented: Hi [~accountid:557058:f0137791-c6cb-47bd-bcce-fc81ad4cfefa] , I am not able to reproduce this, can you please provide reproducible example? Thanks!

exalate-issue-sync[bot] commented 1 year ago

Megan Kurka commented: I don’t have the example I used anymore but still do see that it is overfitting on categorical.

exalate-issue-sync[bot] commented 1 year ago

Zuzana Olajcová commented: 👍 ok, I will add option to turn on one hot encoding in this JIRA. To lower the overfitting, it could also help to lower the max_num_rules.

exalate-issue-sync[bot] commented 1 year ago

Zuzana Olajcová commented: next steps from call with [~accountid:557058:f0137791-c6cb-47bd-bcce-fc81ad4cfefa] :

exalate-issue-sync[bot] commented 1 year ago

Zuzana Olajcová commented: moving to open as blocked by [https://h2oai.atlassian.net/browse/PUBDEV-8133|https://h2oai.atlassian.net/browse/PUBDEV-8133|smart-link]

h2o-ops commented 1 year ago

JIRA Issue Details

Jira Issue: PUBDEV-8011 Assignee: Zuzana Olajcová Reporter: Megan Kurka State: Open Fix Version: N/A Attachments: N/A Development PRs: N/A