h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.89k stars 2k forks source link

Limit impact of high cardinality features in deep learning in AutoML #7644

Closed exalate-issue-sync[bot] closed 1 year ago

exalate-issue-sync[bot] commented 1 year ago

AutoML can have hard time with datasets with high cardinality columns, e.g., Albert[1]. One of the reasons is DeepLearning that one-hot encodes the dataset yielding over 1M columns.

[1] https://www.openml.org/d/41147

h2o-ops commented 1 year ago

JIRA Issue Details

Jira Issue: PUBDEV-8004 Assignee: Tomas Fryda Reporter: Tomas Fryda State: Closed Fix Version: 3.32.1.6 Attachments: N/A Development PRs: Available

h2o-ops commented 1 year ago

Linked PRs from JIRA

https://github.com/h2oai/h2o-3/pull/5344 https://github.com/h2oai/h2o-3/pull/5631