h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.89k stars 1.99k forks source link

Reorganize algorithm parameters: Aggregator #6734

Closed exalate-issue-sync[bot] closed 1 year ago

exalate-issue-sync[bot] commented 1 year ago

{noformat}Defining an Aggregator Model



Parameters are optional unless specified as *required*.

Algorithm-specific parameters
'''''''''''''''''''''''''''''

-  **target_num_exemplars**: Specify a value for the targeted number of exemplars. This value defaults to ``5000``.

-  **rel_tol_num_exemplars**: Specify the relative tolerance for the number of exemplars (e.g, ``0.5`` is +/- 50 percent). This value defaults to ``0.5``.

- **save_mapping_frame**: When this option is enabled, the mapping of rows in an aggregated frame to the one in the original/raw frame will be created and exported. This option defaults to ``False`` (disabled).

- **num_iteration_without_new_exemplar**: The number of iterations to run before aggregator exits if the number of exempalrs collected doesn't change. Defaults to ``500``.

Common parameters
'''''''''''''''''

-  `training_frame <algo-params/training_frame.html>`__: *Required* Specify the dataset used to build the model. **NOTE**: In Flow, if you click the **Build a model** button from the ``Parse`` cell, the training frame is entered automatically.

-  `x <algo-params/x.html>`__: Specify a vector contaitning the character names of the predictors in the model.

-  `model_id <algo-params/model_id.html>`__: Specify a custom name for the model to use as a reference. By default, H2O automatically generates a destination key.

-  `ignore_const_cols <algo-params/ignore_const_cols.html>`__: Enable this option to ignore constant training columns, since no information can be gained from them. This option defaults to ``True`` (enabled).

-  `transform <algo-params/transform.html>`__: Specify the transformation method for numeric columns in the training data. One of

  - ``"none"``
  - ``"standardize"``
  - ``"normalize"`` (default)
  - ``"demean"``
  - ``"descale"``

-  `categorical_encoding <algo-params/categorical_encoding.html>`__: Specify one of the following encoding schemes for handling categorical features (defaults to ``AUTO``):

  - ``auto`` or ``AUTO``: Allow the algorithm to decide (default). In Aggregator, the algorithm will automatically perform ``enum`` encoding.
  - ``one_hot_internal`` or ``OneHotInternal``: On the fly N+1 new cols for categorical features with N levels.
  - ``binary``: No more than 32 columns per categorical feature.
  - ``eigen`` or ``Eigen``: *k* columns per categorical feature, keeping projections of one-hot-encoded matrix onto *k*-dim eigen space only.
  - ``label_encoder`` or ``LabelEncoder``:  Convert every enum into the integer of its index (for example, level 0 -> 0, level 1 -> 1, etc.).
  - ``enum_limited`` or ``EnumLimited``: Automatically reduce categorical levels to the most prevalent ones during Aggregator training and only keep the **T** (10) most frequent levels.

-  `export_checkpoints_dir <algo-params/export_checkpoints_dir.html>`__: Specify a directory to which generated models will be automatically exported.{noformat}
h2o-ops commented 1 year ago

JIRA Issue Details

Jira Issue: PUBDEV-9066 Assignee: hannah.tillman Reporter: hannah.tillman State: Resolved Fix Version: 3.40.0.4 Attachments: N/A Development PRs: Available

h2o-ops commented 1 year ago

Linked PRs from JIRA

https://github.com/h2oai/h2o-3/pull/6708