h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.85k stars 1.99k forks source link

Check and make sure glm coefficients returned and the documentation description of them are consistent. #8390

Open exalate-issue-sync[bot] opened 1 year ago

exalate-issue-sync[bot] commented 1 year ago

In building a GLM model, we can set the parameter standardize = true or false. However, it is not clear if the reported coefficients are those of the standardized coefficients or not. In addition, it is not clear if the documentation of those coefficients are correct either.

This confusion popped up during a discussion between Wendy and Zuzana.

exalate-issue-sync[bot] commented 1 year ago

Wendy commented: Here is the deal:

When standardize=true, the model will fit coefficients _beta with standardized numerical columns. However, global__beta will be coefficients derived from _beta to be used with non-standardized numerical columns. Hence, global_beta can be used to perform scoring without having to standardize the numerical predictors.

When standardize=false, the model will fit coefficients beta with non-standardized numerical columns. In this case, _global_beta will be the same as beta. However, if you are interested to see the values of the coefficients applied to standardized columns, you can call standardizedCoefficients implemented by our own Zuzana Olajcova. When a user wants to see the standardized coefficient, the following transformation will occur:

newbeta(1) = _beta(1)sigma1, newbeta(2) = _beta(2)sigma2, ….

newbeta(0) = _beta(0)+_beta(1)mean1+beta(2)mean2+…..

I need to port this into our documentation to make it clear.

h2o-ops commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-7245 Assignee: Wendy Reporter: Zuzana Olajcová State: Open Fix Version: N/A Attachments: N/A Development PRs: N/A