h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.92k stars 2k forks source link

Add to docs that transform only works on numerical columns #8257

Open exalate-issue-sync[bot] opened 1 year ago

exalate-issue-sync[bot] commented 1 year ago

update the transform docs to indicate that it only works with numerical data. See: [https://h2oai.slack.com/archives/C04KNHH2H/p1584389368047200|https://h2oai.slack.com/archives/C04KNHH2H/p1584389368047200]

Original question (March 16, 2020):

When transform = "standardize" for GLRM, it appears that only numeric columns are standardized and categorical/binary columns are skipped (which seems like the right approach): [https://github.com/h2oai/h2o-3/blob/master/h2o-algos/src/main/java/hex/glrm/GLRM.java#L356-L358|https://github.com/h2oai/h2o-3/blob/master/h2o-algos/src/main/java/hex/glrm/GLRM.java#L356-L358|smart-link] and [https://github.com/h2oai/h2o-3/blob/master/h2o-algos/src/main/java/hex/glrm/GLRM.java#L1030.|https://github.com/h2oai/h2o-3/blob/master/h2o-algos/src/main/java/hex/glrm/GLRM.java#L1030.|smart-link] Is that interpretation correct? Either way, is it possible to clarify the treatment of categorical/binary variables in the documentation for transform? [https://github.com//h2oai/h2o-3/blob/master/h2o-docs/src/product/data-science/algo-params/transform.rst|https://github.com//h2oai/h2o-3/blob/master/h2o-docs/src/product/data-science/algo-params/transform.rst]

exalate-issue-sync[bot] commented 1 year ago

Angela Bartz commented: Pull request merged into rel-yule.

exalate-issue-sync[bot] commented 1 year ago

Wendy commented: Chris:

I messed up my answer to your question in the PR. Here is my answer again.

Standardizaton is only applied to numerical column types. Enum/binary columns are not affected by standardization.

Hope this is clear.

Wendy