h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.87k stars 2k forks source link

Add argument descriptions to H2OAutoML docstring in Python #11446

Closed exalate-issue-sync[bot] closed 1 year ago

exalate-issue-sync[bot] commented 1 year ago

We don't have descriptions of the h2o.automl.H2OAutoML method arguments, let's add those in (preferably automatically generated from Java descriptions).

{code} In [3]: from h2o.automl import H2OAutoML

In [4]: ?H2OAutoML Init signature: H2OAutoML(self, max_runtime_secs=None, max_models=None, stopping_metric=None, stopping_tolerance=None, stopping_rounds=None, seed=None, project_name=None) Docstring: Automatic Machine Learning

The Automatic Machine Learning (AutoML) function automates the supervised machine learning model training process. The current version of AutoML trains and cross-validates a Random Forest, an Extremely-Randomized Forest, a random grid of Gradient Boosting Machines (GBMs), a random grid of Deep Neural Nets, and a Stacked Ensemble of all the models.

:examples:

Setting up an H2OAutoML object

project_name = "Project1" aml = H2OAutoML(max_runtime_secs=30, project_name=project_name) File: /usr/local/lib/python2.7/site-packages/h2o/automl/autoh2o.py Type: type {code}

Note that the train method has separate arguments and they are in fact, listed in the docstring -- we are only missing half the arg docstrings (the half that are attached to the constructor).

{code} In [5]: ?H2OAutoML.train Signature: H2OAutoML.train(self, x=None, y=None, training_frame=None, fold_column=None, weights_column=None, validation_frame=None, leaderboard_frame=None) Docstring: Begins an AutoML task, a background task that automatically builds a number of models with various algorithms and tracks their performance in a leaderboard. At any point in the process you may use H2O's performance or prediction functions on the resulting models.

:param x: A list of column names or indices indicating the predictor columns. :param y: An index or a column name indicating the response column. :param fold_column: The name or index of the column in training_frame that holds per-row fold assignments. :param weights_column: The name or index of the column in training_frame that holds per-row weights. :param training_frame: The H2OFrame having the columns indicated by x and y (as well as any additional columns specified by fold, offset, and weights). :param validation_frame: H2OFrame with validation data to be scored on while training. :param leaderboard_frame: H2OFrame with test data to be scored on in the leaderboard.

:returns: An H2OAutoML object.

:examples:

Set up an H2OAutoML object

aml = H2OAutoML(max_runtime_secs=30)

Launch H2OAutoML

aml.train(y=y, training_frame=training_frame) File: /usr/local/lib/python2.7/site-packages/h2o/automl/autoh2o.py Type: instancemethod {code}

exalate-issue-sync[bot] commented 1 year ago

Erin LeDell commented: [~accountid:557058:eac185dd-5a5c-46e9-bb5a-13217ee9c218] Please also update the example code to be a fully reproducible example based on the user guide example: http://h2o-release.s3.amazonaws.com/h2o/master/3905/docs-website/h2o-docs/automl.html

{code} import h2o from h2o.automl import H2OAutoML

h2o.init()

Import a sample binary outcome train/test set into H2O

train = h2o.import_file("https://s3.amazonaws.com/erin-data/higgs/higgs_train_10k.csv") test = h2o.import_file("https://s3.amazonaws.com/erin-data/higgs/higgs_test_5k.csv")

Identify predictors and response

x = train.columns y = "response" x.remove(y)

For binary classification, response should be a factor

train[y] = train[y].asfactor() test[y] = test[y].asfactor()

Run AutoML for 30 seconds

aml = H2OAutoML(max_runtime_secs = 30) aml.train(x = x, y = y, training_frame = train, leaderboard_frame = test) {code}

exalate-issue-sync[bot] commented 1 year ago

Lauren DiPerna commented: will add doc string and doc build updates manually, we can create another jira ticket if we want the doc updates to be part of the autogen file

exalate-issue-sync[bot] commented 1 year ago

Lauren DiPerna commented: going to put this under miscellaneous, but can be update later if needed

hasithjp commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-4563 Assignee: Lauren DiPerna Reporter: Erin LeDell State: Closed Fix Version: 3.10.5.1 Attachments: N/A Development PRs: Available

Linked PRs from JIRA

https://github.com/h2oai/h2o-3/pull/1246