h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.92k stars 2k forks source link

Documentation for Generalized Additive Models (GAM) could be improved #7200

Closed exalate-issue-sync[bot] closed 1 year ago

exalate-issue-sync[bot] commented 1 year ago

We feel like in general the documentation for some statistical models could be improved, particularly for the GAM in this case. Sometimes we need to investigate source code to understand what a parameter does, how does it relate to more common names in the literature, or range of values accepted.

As a concrete example, the parameter ‘bs’ which sets the basis spline fit has a default value of 0 that corresponds to a cubic basis spline representation. There aren’t any other values listed, but looking into the source code h2o-3/GAMModel.java at jenkins-3.34.0.3 · h2oai/h2o-3 (github.com), we can find a few hints about its usage (for example lines 215, 216).

As developers ourselves, we understand how this one can be trickier to fix, but it would vastly improve the use of these models.

Ps: in general, I would say that models that are more traditional in statistics and not data science (e.g. GLM with less typical distributions, GAM, Cox models vs. Random Forests / Gradient Boosting) are the ones that could use better documentation.

exalate-issue-sync[bot] commented 1 year ago

Arun Aryasomayajula commented: Any updates [~accountid:5d1185d4f46aa30c271c7cc6] ?

exalate-issue-sync[bot] commented 1 year ago

hannah.tillman commented: Hey [~accountid:5fa438f822f3990076aa232d] ! I am starting this ticket now 🙂 Currently, I am going through the schemas and python docs for these estimators to find the needed information and gathering it all in a spreadsheet. I’ll mark this ticket as officially “in progress” when I start adding the gathered information to the user guide.

exalate-issue-sync[bot] commented 1 year ago

Narasimha Durgam commented: Hello [~accountid:5d1185d4f46aa30c271c7cc6]! I think adding more context on parameters “spline_orders“ & “bs“ would be helpful! For example:

The spline_orders parameter specifies the order of the polynomials used in monotone splines. For example, spline_orders=3 means a polynomial of order 3 will be used in the splines.

The bs (As suggested in ticket description) parameter allows for the selection of different spline types. The acceptable range of values is from 0 to 2, currently, we have 0 for cubic splines, 1 for thin-plate splines, and 2 for monotone splines.

Please feel free to add or edit information as needed. Thank you!

exalate-issue-sync[bot] commented 1 year ago

Wendy Wong commented: [~accountid:557058:04659f86-fbfe-4d01-90c9-146c34df6ee6] asked me to ask [~accountid:6335b09597148a8301fd22dc] to help with this one.

exalate-issue-sync[bot] commented 1 year ago

Arun Aryasomayajula commented: [~accountid:5d1185d4f46aa30c271c7cc6] any updates on this JIRA?

exalate-issue-sync[bot] commented 1 year ago

hannah.tillman commented: [~accountid:5fa438f822f3990076aa232d] New PR currently being reviewed: [https://github.com/h2oai/h2o-3/pull/6422|https://github.com/h2oai/h2o-3/pull/6422|smart-link]

h2o-ops-ro commented 1 year ago

JIRA Issue Details

Jira Issue: PUBDEV-8461 Assignee: amin.sedaghat Reporter: Arun Aryasomayajula State: Resolved Fix Version: 3.42.0.1 Attachments: N/A Development PRs: Available

h2o-ops-ro commented 1 year ago

Linked PRs from JIRA

https://github.com/h2oai/h2o-3/pull/6251 https://github.com/h2oai/h2o-3/pull/6422 https://github.com/h2oai/h2o-3/pull/6724