Enable h2o.stackedensemble() to build ensembles using a subset of the training set of base models

h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

http://h2o.ai

Apache License 2.0

6.88k stars 1.99k forks source link

Enable h2o.stackedensemble() to build ensembles using a subset of the training set of base models #12882

Open exalate-issue-sync[bot] opened 1 year ago

exalate-issue-sync[bot] commented 1 year ago

Currently the h2o.stackedensemble() must use exactly the same dataset on which the based models are trained as the training set to build the ensemble. However, this is not requested by the algorithm. Based on some literature, the base models can be built using a dataset D and then blended separately on D1, D2.....Dn, which are subsets of D, with different weights. Since D1, D2....Dn are subsets of D, their cross-validated predictions are already included in the base models.

exalate-issue-sync[bot] commented 1 year ago

Tomas Fryda commented: Hi [~accountid:557058:3153fc68-3f65-4f5d-843b-cb78a7048231] , I am looking into this issue and I was wondering if [https://0xdata.atlassian.net/browse/PUBDEV-4916|https://0xdata.atlassian.net/browse/PUBDEV-4916|smart-link] would solve the use-case you had. If it wouldn’t, could you please clarify it a bit more, e.g., use-case would help. Thanks!

hasithjp commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-6035 Assignee: Tomas Fryda Reporter: Yu Cao State: Open Fix Version: N/A Attachments: N/A Development PRs: N/A