h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.92k stars 2k forks source link

Set Random Forest distribution to binomial instead of multinomial in AutoML binomial problems #7756

Open exalate-issue-sync[bot] opened 1 year ago

exalate-issue-sync[bot] commented 1 year ago

Even though RF doesn't have a "distribution" param, we are setting it automatically in AutoML. However, we are setting both binomial and multinomial as "multinomial". We should add an extra if statement to make these separate cases (even though the RF doesn’t use this info). This is confusing because the resulting RF model has the wrong distribution set when users are inspecting the params of the model, after-the-fact.

Current logic:

{code:java}if (_parms._distribution == DistributionFamily.AUTO) { if (_nclass == 1) _parms._distribution = DistributionFamily.gaussian; if (_nclass >= 2) _parms._distribution = DistributionFamily.multinomial; }{code}

h2o-ops commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-7889 Assignee: Sebastien Poirier Reporter: Erin LeDell State: Open Fix Version: Backlog Attachments: N/A Development PRs: N/A