h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.9k stars 2k forks source link

Allow logistic regression with continuous outcome with range [0-1] #9980

Open exalate-issue-sync[bot] opened 1 year ago

exalate-issue-sync[bot] commented 1 year ago

When running h2o.glm with family = "binomial" and continuous outcome with range [0-1] the following error is produced: "Binomial requires the response to be a 2-class categorical or a binary column (0/1)"

Note that stats::glm and speedglm::speedglm in R can do this (with a warning). In theory, the current implementation of logistic regression in h2o.glm should work as well by simply allowing this. In practice, commenting out the above error check and commenting out the gains/list table evaluation is able to produce the coefficients. These coefficient however are wrong for solver = "L_BFGS" and seem correct for solver = "IRLSM" (based only on a single tiny test set).

One often has to run such logistic regressions when implementing Q-learning algorithms in survival analysis, such as, the sequential G-computation algorithm or when performing the Targeted Minimum Loss-Based Estimation. I'd be happy to provide some test code or use cases if it is desired.

h2o-ops commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-3057 Assignee: New H2O Bugs Reporter: Oleg Sofrygin State: Open Fix Version: N/A Attachments: N/A Development PRs: N/A