h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.92k stars 2k forks source link

Reorganize algorithm parameters: CoxPH #6760

Closed exalate-issue-sync[bot] closed 1 year ago

exalate-issue-sync[bot] commented 1 year ago

{noformat}Defining a CoxPH Model



Parameters are optional unless specified as *required*.

Algorithm-specific parameters
'''''''''''''''''''''''''''''

-  `stop_column <algo-params/stop_column.html>`__: *Required* The name of an integer column in the **source** data set representing the stop time. 

-  `start_column <algo-params/start_column.html>`__: The name of an integer column in the **source** data set representing the start time. If supplied, the value of the ``start_column`` must be strictly less than the ``stop_column`` in each row.

-  `stratify_by <algo-params/stratify_by.html>`__: A list of columns to use for stratification.

-  `ties <algo-params/ties.html>`__: The approximation method for handling ties in the partial likelihood. This can be either ``efron`` (default) or ``breslow``. See the :ref:`coxph_model_details` section below for more information about these options.

-  `init <algo-params/init2.html>`__: Initial values for the coefficients in the model. This value defaults to ``0``.

-  `lre_min <algo-params/lre_min.html>`__: A positive number to use as the minimum log-relative error (LRE) of subsequent log partial likelihood calculations to determine algorithmic convergence. The role this parameter plays in the stopping criteria of the model fitting algorithm is explained in the :ref:`coxph_algorithm` section below. This value defaults to ``9``.

-  `interactions <algo-params/interactions.html>`__: Specify a list of predictor column indices to interact. All pairwise combinations will be computed for this list. 

-  `interaction_pairs <algo-params/interaction_pairs.html>`__: (Internal only) When defining interactions, use this option to specify a list of pairwise column interactions (interactions between two variables). Note that this is different than ``interactions``, which will compute all pairwise combinations of specified columns. This option defaults to ``False`` (disabled).
-  **interactions_only**: A list of columns that should only be used to create interactions but should not itself participate in model training.

Common parameters
'''''''''''''''''

-  `training_frame <algo-params/training_frame.html>`__: *Required* Specify the dataset used to build the model. **NOTE**: In Flow, if you click the **Build a model** button from the ``Parse`` cell, the training frame is entered automatically.

-  `y <algo-params/y.html>`__ (Python) / **event_column** (R): *Required* Specify the column to use as the dependent variable. The data can be numeric or categorical.

- `x <algo-params/x.html>`__: Specify a vector containing the names or indicies of the predictor variables to use when building the model. If ``x`` is missing, then all columns except ``y`` are used.

-  `model_id <algo-params/model_id.html>`__: Specify a custom name for the model to use as a reference. By default, H2O automatically generates a destination key.

-  `ignored_columns <algo-params/ignored_columns.html>`__: (Python and Flow only) Specify the column or columns to be excluded from the model. In Flow, click the checkbox next to a column name to add it to the list of columns excluded from the model. To add all columns, click the **All** button. To remove a column from the list of ignored columns, click the X next to the column name. To remove all columns from the list of ignored columns, click the **None** button. To search for a specific column, type the column name in the **Search** field above the column list. To only show columns with a specific percentage of missing values, specify the percentage in the **Only show columns with more than 0% missing values** field. To change the selections for the hidden columns, use the **Select Visible** or **Deselect Visible** buttons.

-  `weights_column <algo-params/weights_column.html>`__: Specify a column to use for the observation weights, which are used for bias correction. The specified  ``weights_column`` must be included in the specified ``training_frame``. 

    *Python only*: To use a weights column when passing an H2OFrame to ``x`` instead of a list of column names, the specified ``training_frame`` must contain the specified ``weights_column``. 

    **Note**: Weights are per-row observation weights and do not increase the size of the data frame. This is typically the number of times a row is repeated, but non-integer values are supported as well. During training, rows with higher weights matter more, due to the larger loss function pre-factor.

-  `offset_column <algo-params/offset_column.html>`__: Specify a column to use as the offset.

   **Note**: Offsets are per-row "bias values" that are used during model training. For Gaussian distributions, they can be seen as simple corrections to the response (``y``) column. Instead of learning to predict the response (y-row), the model learns to predict the (row) offset of the response column. For other distributions, the offset corrections are applied in the linearized space before applying the inverse link function to get the actual response values. 

-  `max_iterations <algo-params/max_iterations.html>`__: A positive integer defining the maximum number of iterations during model training. The role this parameter plays in the stopping criteria of the model-fitting algorithm is explained in the :ref:`coxph_algorithm` section below. This value defaults to ``20``.

- `use_all_factor_levels <algo-params/coxph.html>`__: Specify whether to use all factor levels in the possible set of predictors; if you enable this option, sufficient regularization is required. By default, the first factor level is skipped. This option defaults to ``True`` (enabled).

-  `export_checkpoints_dir <algo-params/export_checkpoints_dir.html>`__: Specify a directory to which generated models will automatically be exported.

- `single_node_mode <algo-params/single_node_mode.html>`__: Specify whether to run on a single node for fine-tuning of model parameters. Running on a single node reduces the effect of network overhead (for smaller datasets). This defaults to ``False``.
{noformat}
h2o-ops commented 1 year ago

JIRA Issue Details

Jira Issue: PUBDEV-9051 Assignee: hannah.tillman Reporter: hannah.tillman State: Resolved Fix Version: 3.40.0.4 Attachments: N/A Development PRs: Available

h2o-ops commented 1 year ago

Linked PRs from JIRA

https://github.com/h2oai/h2o-3/pull/6678