H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Currently toNumeric() looks at the first item in the domain list. If that item can be interpreted as an integer, then the domain values are used to create the resulting numeric vector. If that item cannot be read as an integer, the enumeration levels are used to create the resulting vector. While occasionally handy, this inconsistency is going to be magical as to how it works for many users. Instead the return value should always be the enumeration levels (ala R). To use the domain for the result, as.numeric(as.character(foo)) can should be used in R, with an optimization in the Rapids tree walker to collapse the two operations into a single call to VecUtils.categoricalDomainsToNumeric() (thus skipping an unneeded creation of a temporary string column.
VecUtils.categoricalDomainsToNumeric() should be made to handle either integer or real values.
This will be a user facing change and should be highlighted in whatever release it is a part of. It will also break regression tests that rely on it.
Currently toNumeric() looks at the first item in the domain list. If that item can be interpreted as an integer, then the domain values are used to create the resulting numeric vector. If that item cannot be read as an integer, the enumeration levels are used to create the resulting vector. While occasionally handy, this inconsistency is going to be magical as to how it works for many users. Instead the return value should always be the enumeration levels (ala R). To use the domain for the result, as.numeric(as.character(foo)) can should be used in R, with an optimization in the Rapids tree walker to collapse the two operations into a single call to VecUtils.categoricalDomainsToNumeric() (thus skipping an unneeded creation of a temporary string column.
VecUtils.categoricalDomainsToNumeric() should be made to handle either integer or real values.
This will be a user facing change and should be highlighted in whatever release it is a part of. It will also break regression tests that rely on it.