h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.91k stars 2k forks source link

Rulefit: fix case when constant columns are created in training frame for glm #7324

Open exalate-issue-sync[bot] opened 1 year ago

exalate-issue-sync[bot] commented 1 year ago

reproducible by

{code:java} final Frame fr = Scope.track(parseTestFile("missing.csv")); RuleFitModel.RuleFitParameters params = new RuleFitModel.RuleFitParameters(); params._seed = -1; params._response_column = "response"; params._train = fr._key; params._max_num_rules = 50; params._max_rule_length = 7; params._min_rule_length = 3;

        RuleFitModel model = new RuleFit(params).trainModel().get();
        Scope.track_generic(model);

        MojoModel mojoModel = model.toMojo();

        mojoModel.score0(new double[] {0,0,3}, new double[] {0,0});

{code}

missing.csv is dataset used in mojoland tests

exalate-issue-sync[bot] commented 1 year ago

Zuzana Olajcová commented: This is currently fixed by {{glmParameters._ignore_const_cols = false;}} line in hex.rulefit.RuleFit#initGLMParameters, so that const cols are being preserved and mojo does not miss them. The complex fix requires to rework the mojo.

h2o-ops-ro commented 1 year ago

JIRA Issue Details

Jira Issue: PUBDEV-8333 Assignee: Zuzana Olajcová Reporter: Zuzana Olajcová State: Open Fix Version: N/A Attachments: N/A Development PRs: Available

h2o-ops-ro commented 1 year ago

Linked PRs from JIRA

https://github.com/h2oai/mojoland/pull/21