h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.78k stars 1.99k forks source link

Automatic Feature Engineering for GLM #6569

Open exalate-issue-sync[bot] opened 1 year ago

exalate-issue-sync[bot] commented 1 year ago

GLM generally requires the most feature engineering because it builds a linear model. This request is to perform the common feature engineering as a pre-processing step to train GLM when running it in AutoML.

Steps:

write up feature engineering in Python or R API

benchmark how much GLM’s improve from the feature engineering

add to Java code-base

add MOJO support

h2o-ops commented 1 year ago

JIRA Issue Details

Jira Issue: PUBDEV-8777 Assignee: New H2O Bugs Reporter: Megan Kurka State: Open Fix Version: N/A Attachments: N/A Development PRs: N/A

wendycwong commented 1 year ago

This is a good idea and I think it should be added to other algos as well. We will need to figure out what are the most common feature engineering methods and then decide on what to add.