h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.9k stars 2k forks source link

Documentation: Update GLM FAQ and missing_values_handling parameter regarding unseen categorical values #11176

Closed exalate-issue-sync[bot] closed 1 year ago

exalate-issue-sync[bot] commented 1 year ago

From [~accountid:557058:2ceb7f2b-e7ca-465c-8e82-c046991100be]: Missing categorical levels are imputed with mod, not a special missing level.

Unseen categorical levels are treated based on the missing values handling during training.

If your missing value handling was set to imputation with mean, the unseen levels are replaced by the most frequent level present in training (mod).

If your missing value treatment was Skip, the variable is ignored for the given observation.

If you ran with use_all_factor_levels=False that essential means they are replaced by the reference level.

exalate-issue-sync[bot] commented 1 year ago

Angela Bartz commented: Updated after review

GLM FAQ now states the following regarding missing values:

exalate-issue-sync[bot] commented 1 year ago

Angela Bartz commented: Pull request 1038 submitted.

exalate-issue-sync[bot] commented 1 year ago

Angela Bartz commented: Also updated the missing_values_handling parameter description as below:

"... Note that in Deep Learning, unseen categorical variables are imputed by adding an extra “missing” level. In GLM, unseen categorical levels are replaced by the most frequent level present in training (mod). Optionally, either algorithm can skip all rows with any missing values."

h2o-ops commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-4287 Assignee: Angela Bartz Reporter: Angela Bartz State: Resolved Fix Version: 3.10.4.4 Attachments: N/A Development PRs: Available

Linked PRs from JIRA

https://github.com/h2oai/h2o-3/pull/1038

h2o-ops commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-4287 Assignee: Angela Bartz Reporter: Angela Bartz State: Resolved Fix Version: 3.10.4.4 Attachments: N/A Development PRs: Available

Linked PRs from JIRA

https://github.com/h2oai/h2o-3/pull/1038