h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.92k stars 2k forks source link

add interaction to GLM Mojo #7621

Open exalate-issue-sync[bot] opened 1 year ago

exalate-issue-sync[bot] commented 1 year ago

This Jira added support to interaction pairs to GLM Mojo. However, we are not supporting enum, enum interaction pairs that contains NA’s in them. Support for enum, enum interaction pairs with NAs will be support in this JIRA: [https://h2oai.atlassian.net/browse/PUBDEV-8130|https://h2oai.atlassian.net/browse/PUBDEV-8130|smart-link]

exalate-issue-sync[bot] commented 1 year ago

Wendy commented: Met with karthik today and here is the summary:

Karthik will fix failed tests for his opened PR;

To understand interaction, check out this test: h2o-3/h2o-py/tests/testdir_algos/glm/pyunit_pubdev_6999_glm_interaction_NA.py.

To understand how mojo is called and run, check out testBinomialPredMojoPojo() in GLMBasicTetBinomial.java in h2o-3/h2o-algos/src/test/glm

When you are done fixing this, you will probably need to change pyunit_glm_interaction_MOJO_fail.py

If you have time, google how R supports interaction for GAM.

exalate-issue-sync[bot] commented 1 year ago

Wendy commented: Met with Karthik today. He will start understanding scoring with interaction pairs. Next, he will add interaction to GLM Mojo. This will involves changes in but not limiting to GLMMojoWriter, GLMMojoReader, GLMMojoModelBase and others. Good luck!

exalate-issue-sync[bot] commented 1 year ago

Wendy commented: March 11, 2021:

Discussed with karthik. This will be continued after he finishes the GLM compatibility check.

exalate-issue-sync[bot] commented 1 year ago

Wendy commented: Met with Karthik today and he is working on how to incorporate interaction into the data read into mojo

exalate-issue-sync[bot] commented 1 year ago

Wendy commented: March 15, 2021

Met with Karthik today:

He has been working on MojoLand.

For GLM interaction to Mojo, he finished the glmMojoReader and glmMojoWriter. He will work on mojo scoring next.

exalate-issue-sync[bot] commented 1 year ago

Wendy commented: April 1, 2021:

Karthik is getting close to finishing up the coding to incorporate feature interaction into the GLM mojo. I have come up with the following test cases:

The following families should be included: gaussian, binomial, multinomial.

In case of interactions, I want to propose the following:

exalate-issue-sync[bot] commented 1 year ago

Wendy commented: Karthik:

Write a GLM interaction test with the dataset generated from the following code:

{noformat}seed = 12345 bigCat = pyunit_utils.random_dataset_enums_only(10000, 1, factorL=30, misFrac=0.01, randSeed=seed) bitCat2 = pyunit_utils.random_dataset_enums_only(10000, 1, factorL=20, misFrac=0.01, randSeed=seed) smallCats = pyunit_utils.random_dataset_enums_only(10000, 4, factorL=5, misFrac=0.01, randSeed=seed) numerics = pyunit_utils.random_dataset_numeric_only(10000, 4, integerR=100, misFrac=0.01, randSeed=seed) dataframe = numerics.cbind(smallCats.cbind(bitCat2.cbind(bigCat))) dataframe.set_names(["response","n1","n2","n3","n4","c1","c2","c3","c4","c5","c6"]) interaction_pairs = [("c1", "n1"), ("c5", "n2"), ("c1", "c2"), ("c3", "c5"), ("n3", "n4")]{noformat}

Family = Gaussian,

y = “response”,

interaction_pairs as specified

x = all columns in dataframe except response column

Thanks, Wendy

h2o-ops commented 1 year ago

JIRA Issue Details

Jira Issue: PUBDEV-8027 Assignee: Karthik Murthy Reporter: Wendy State: Open Fix Version: N/A Attachments: N/A Development PRs: Available

h2o-ops commented 1 year ago

Linked PRs from JIRA

https://github.com/h2oai/h2o-3/pull/5420