H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
When running a RGS, we may want to privilege some parameters values over others.
This can be “partly done” today by duplicating a parameter value for example, but the result is not 100% as expected:
when trying to generate new random hyper-parameters, duplicates from previous parameters permutations are checked based on index, not on value, therefore if we use the hyper-param {{param_dummy = ['A', 'A', 'B']}} then we effectively double the probability to use value {{'A'}} but as {{'A'}} (idx 0 ) is different from {{'A'}} (idx 1) then the walker considers that we have 2 different parameters and then {{GridSearch}} may try to train 2 models with exactly the same hyper-parameters.
to avoid training duplicates (and also to resume an existing grid), {{GridSearch}} first tries to find an existing model by {{checksum}} . However, even when found this way, the model is added to the grid and counted as an additional model (also impacting the {{max_models}} behaviour).
To avoid the issues above, I suggest to offer the possibility to provide explicit weights to some parameters through a {{meta}} parameter:
the walkers supporting weights (currently only {{RandomDiscreteValueWalker}}) will then be able to extract those meta-params, validate them (ensure ints, same size as corresponding param…), and use them to tweak the random hyper-param selector.
h4. Benefits of this syntax (meta-param) over additional method parameter:
doesn’t require any API change.
hyper-params are always passed as strings so we can use the {{$}} separator without risk of conflict.
weights can be easily declared right below the corresponding param on any client (including when creating a Java HashMap) for clarity/visibility.
can also be extracted easily from sub-groups (hyper-parameters support parameters grouping for related params).
When running a RGS, we may want to privilege some parameters values over others. This can be “partly done” today by duplicating a parameter value for example, but the result is not 100% as expected:
To avoid the issues above, I suggest to offer the possibility to provide explicit weights to some parameters through a {{meta}} parameter:
{code:none}param_dummy = ['A', 'B'] param_dummy$weights = [2, 1]{code}
the walkers supporting weights (currently only {{RandomDiscreteValueWalker}}) will then be able to extract those meta-params, validate them (ensure ints, same size as corresponding param…), and use them to tweak the random hyper-param selector.
h4. Benefits of this syntax (meta-param) over additional method parameter:
h4. Drawbacks of this syntax: