h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.85k stars 1.99k forks source link

Create a custom AutoML strategy for classification with a high number of classes #8770

Open exalate-issue-sync[bot] opened 1 year ago

exalate-issue-sync[bot] commented 1 year ago

We should consider a re-ordering of the algorithms when there are a high number of classes in the response. For example, perhaps >10 classes, we switch to prioritize GLMs and DNNs over tree-based methods.

Benchmark results on MNIST: https://www.kaggle.com/tunguz/mnist-with-h2o-automl/

exalate-issue-sync[bot] commented 1 year ago

Sebastien Poirier commented: Those changes should be driven by benchmarks: it’s not only the order that we can change, but also the weight (time + number of models) given to each grid search: in this case, we should also increase the weight of the DNN grid searches.

Also, still based on benchmarks, let’s see if we can’t change the order/weights for:

Note that the priority order makes sense mainly when the max runtime is small, whereas changing the weight (of grids especially) on top of will have an impact on the proportion of models of each type being trained.

h2o-ops commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-6864 Assignee: Sebastien Poirier Reporter: Erin LeDell State: Open Fix Version: N/A Attachments: N/A Development PRs: N/A