H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Parameters are optional unless specified as *required*.
Algorithm-specific parameters
'''''''''''''''''''''''''''''
- **target_num_exemplars**: Specify a value for the targeted number of exemplars. This value defaults to ``5000``.
- **rel_tol_num_exemplars**: Specify the relative tolerance for the number of exemplars (e.g, ``0.5`` is +/- 50 percent). This value defaults to ``0.5``.
- **save_mapping_frame**: When this option is enabled, the mapping of rows in an aggregated frame to the one in the original/raw frame will be created and exported. This option defaults to ``False`` (disabled).
- **num_iteration_without_new_exemplar**: The number of iterations to run before aggregator exits if the number of exempalrs collected doesn't change. Defaults to ``500``.
Common parameters
'''''''''''''''''
- `training_frame <algo-params/training_frame.html>`__: *Required* Specify the dataset used to build the model. **NOTE**: In Flow, if you click the **Build a model** button from the ``Parse`` cell, the training frame is entered automatically.
- `x <algo-params/x.html>`__: Specify a vector contaitning the character names of the predictors in the model.
- `model_id <algo-params/model_id.html>`__: Specify a custom name for the model to use as a reference. By default, H2O automatically generates a destination key.
- `ignore_const_cols <algo-params/ignore_const_cols.html>`__: Enable this option to ignore constant training columns, since no information can be gained from them. This option defaults to ``True`` (enabled).
- `transform <algo-params/transform.html>`__: Specify the transformation method for numeric columns in the training data. One of
- ``"none"``
- ``"standardize"``
- ``"normalize"`` (default)
- ``"demean"``
- ``"descale"``
- `categorical_encoding <algo-params/categorical_encoding.html>`__: Specify one of the following encoding schemes for handling categorical features (defaults to ``AUTO``):
- ``auto`` or ``AUTO``: Allow the algorithm to decide (default). In Aggregator, the algorithm will automatically perform ``enum`` encoding.
- ``one_hot_internal`` or ``OneHotInternal``: On the fly N+1 new cols for categorical features with N levels.
- ``binary``: No more than 32 columns per categorical feature.
- ``eigen`` or ``Eigen``: *k* columns per categorical feature, keeping projections of one-hot-encoded matrix onto *k*-dim eigen space only.
- ``label_encoder`` or ``LabelEncoder``: Convert every enum into the integer of its index (for example, level 0 -> 0, level 1 -> 1, etc.).
- ``enum_limited`` or ``EnumLimited``: Automatically reduce categorical levels to the most prevalent ones during Aggregator training and only keep the **T** (10) most frequent levels.
- `export_checkpoints_dir <algo-params/export_checkpoints_dir.html>`__: Specify a directory to which generated models will be automatically exported.{noformat}
{noformat}Defining an Aggregator Model