h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.9k stars 2k forks source link

Multi-Output Regression Models #16431

Open narasimhard opened 1 day ago

narasimhard commented 1 day ago

It would be fantastic to have models that can handle multiple target regression problems, for instance, in a scenario where the same set of independent variables can be used to learn and predict multiple dependent outcomes. This would simplify the training process by using just one model and also make deployment easier.

wendycwong commented 1 day ago

@narasimhard

Needing some clarification here. If I have one set of predictors, I want to be able to build a model to predict say dependent outcome 1, outcome 2 and outcome 3. If we are thinking about using GLM, are you saying that we want to have three sets of parameters for the 3 outcomes right?

Thanks, Wendy

tomasfryda commented 1 day ago

@narasimhard please correct me if I'm wrong but I think the point could be joint estimation of the parameters (e.g. betas). @wendycwong GLM is a little bit special in this area since there is a generalization of GLM that is trying to do that (VGLM). It still uses N sets of "betas" but they are different than independently estimated betas using N independent GLMs, i.e., the N sets of betas can take an advantage of correlation structure in the data and build a better estimator.

For DeepLearning, I think the solution would be just more neurons in the output layer.

Am I correct @narasimhard ?

narasimhard commented 1 day ago

Yes, that's what I am thinking as well. When using DL models, there would be more neurons in the output that can do the predictions for N target variables. @tomasfryda @wendycwong