h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.92k stars 2k forks source link

Add stratified sampling per-tree for DRF/GBM #9749

Closed exalate-issue-sync[bot] closed 1 year ago

exalate-issue-sync[bot] commented 1 year ago

We want to be able to have each tree in the ensemble sample from the full training dataset, but not just with a global sampling factor (sample_rate), but a per-class specific sampling rate. This can help for imbalanced datasets.

float[] sample_rate_per_class

h2o-ops commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-2815 Assignee: Arno Candel Reporter: Arno Candel State: Resolved Fix Version: 3.8.2.2 Attachments: N/A Development PRs: N/A