Closed exalate-issue-sync[bot] closed 1 year ago
Arun Aryasomayajula commented: Any updates [~accountid:5d1185d4f46aa30c271c7cc6] ?
hannah.tillman commented: Hey [~accountid:5fa438f822f3990076aa232d] ! I am starting this ticket now 🙂 Currently, I am going through the schemas and python docs for these estimators to find the needed information and gathering it all in a spreadsheet. I’ll mark this ticket as officially “in progress” when I start adding the gathered information to the user guide.
Narasimha Durgam commented: Hello [~accountid:5d1185d4f46aa30c271c7cc6]! I think adding more context on parameters “spline_orders“ & “bs“ would be helpful! For example:
Please feel free to add or edit information as needed. Thank you!
Wendy Wong commented: [~accountid:557058:04659f86-fbfe-4d01-90c9-146c34df6ee6] asked me to ask [~accountid:6335b09597148a8301fd22dc] to help with this one.
Arun Aryasomayajula commented: [~accountid:5d1185d4f46aa30c271c7cc6] any updates on this JIRA?
hannah.tillman commented: [~accountid:5fa438f822f3990076aa232d] New PR currently being reviewed: [https://github.com/h2oai/h2o-3/pull/6422|https://github.com/h2oai/h2o-3/pull/6422|smart-link]
JIRA Issue Details
Jira Issue: PUBDEV-8461 Assignee: amin.sedaghat Reporter: Arun Aryasomayajula State: Resolved Fix Version: 3.42.0.1 Attachments: N/A Development PRs: Available
We feel like in general the documentation for some statistical models could be improved, particularly for the GAM in this case. Sometimes we need to investigate source code to understand what a parameter does, how does it relate to more common names in the literature, or range of values accepted.
As a concrete example, the parameter ‘bs’ which sets the basis spline fit has a default value of 0 that corresponds to a cubic basis spline representation. There aren’t any other values listed, but looking into the source code h2o-3/GAMModel.java at jenkins-3.34.0.3 · h2oai/h2o-3 (github.com), we can find a few hints about its usage (for example lines 215, 216).
As developers ourselves, we understand how this one can be trickier to fix, but it would vastly improve the use of these models.
Ps: in general, I would say that models that are more traditional in statistics and not data science (e.g. GLM with less typical distributions, GAM, Cox models vs. Random Forests / Gradient Boosting) are the ones that could use better documentation.