H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
When running h2o.glm with family = "binomial" and continuous outcome with range [0-1] the following error is produced: "Binomial requires the response to be a 2-class categorical or a binary column (0/1)"
Note that stats::glm and speedglm::speedglm in R can do this (with a warning). In theory, the current implementation of logistic regression in h2o.glm should work as well by simply allowing this. In practice, commenting out the above error check and commenting out the gains/list table evaluation is able to produce the coefficients. These coefficient however are wrong for solver = "L_BFGS" and seem correct for solver = "IRLSM" (based only on a single tiny test set).
One often has to run such logistic regressions when implementing Q-learning algorithms in survival analysis, such as, the sequential G-computation algorithm or when performing the Targeted Minimum Loss-Based Estimation. I'd be happy to provide some test code or use cases if it is desired.
When running h2o.glm with family = "binomial" and continuous outcome with range [0-1] the following error is produced: "Binomial requires the response to be a 2-class categorical or a binary column (0/1)"
Note that stats::glm and speedglm::speedglm in R can do this (with a warning). In theory, the current implementation of logistic regression in h2o.glm should work as well by simply allowing this. In practice, commenting out the above error check and commenting out the gains/list table evaluation is able to produce the coefficients. These coefficient however are wrong for solver = "L_BFGS" and seem correct for solver = "IRLSM" (based only on a single tiny test set).
One often has to run such logistic regressions when implementing Q-learning algorithms in survival analysis, such as, the sequential G-computation algorithm or when performing the Targeted Minimum Loss-Based Estimation. I'd be happy to provide some test code or use cases if it is desired.