H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
When importing file that has high cardinality columns, I am getting the following
Error: DistributedException from /172.16.2.196:55888: 'Exceeded categorical limi
t on column #3 (using 1-based indexing). Consider reparsing this column as a st
ring.', caused by water.parser.ParseDataset$H2OParseException: Exceeded categori
cal limit on column #3 (using 1-based indexing). Consider reparsing this column
as a string.
Execution halted
using recent h2o.
I understand that it is caused by a column being of type enum, and it suggests to use string, but still, we could remove the categorical limit, and allow to store high cardinality columns as enum not forcing users to use string. Using enum type, rather than string, is likely to speed up operations like h2o.merge, even for a high cardinality enum.
When importing file that has high cardinality columns, I am getting the following
using recent h2o.
I understand that it is caused by a column being of type enum, and it suggests to use string, but still, we could remove the categorical limit, and allow to store high cardinality columns as
enum
not forcing users to usestring
. Using enum type, rather than string, is likely to speed up operations like h2o.merge, even for a high cardinality enum.