h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.94k stars 2k forks source link

Introduced MaterializedRandomWalker #8625

Open exalate-issue-sync[bot] opened 1 year ago

exalate-issue-sync[bot] commented 1 year ago

RandomDiscreteValueWalker walker is not very efficient as it picks indices for the grid of hyper parameters randomly and losing time trying to randomly select new unseen indices.

We can materialise whole grid, shuffle it and them poll items sequentially.

Having whole hyperspace materialised also might be useful(actually required) for Bayesian approaches for hyper parameters optimisation.

Micro-benchmarks show that new approach is faster( task: traversing whole grid): 00:27:59.411 [QUIET] [system.out] Benchmark Mode Cnt Score Error Units 00:27:59.412 [QUIET] [system.out] OldRGSBench.traverseWholeGrid avgt 3 1.529 ± 0.024 ms/op

00:32:03.188 [QUIET] [system.out] Benchmark Mode Cnt Score Error Units 00:32:03.189 [QUIET] [system.out] NewRGSBench.traverseWholeGridNew avgt 3 0.755 ± 0.053 ms/op

Also it is more efficient to apply filters once for the materialised collection of all grid entities as we can narrow down input by chaining filter functions. Current implementation is working with hashcodes of indices so it means that in order to skip some grid entires we will be spending time on a construction of ModelParameters or at least Object[] array of values because user most likely would want to specify dependencies between hyper parameters based on values ( not indices).

exalate-issue-sync[bot] commented 1 year ago

Andrey Spiridonov commented: “Support for handling interdependent hyper parameters in the grid” part of the original task was moved into a separate Jira [https://0xdata.atlassian.net/browse/PUBDEV-7037|https://0xdata.atlassian.net/browse/PUBDEV-7037|smart-link] as there is no strong dependency between materialisation and filtering.

h2o-ops commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-7015 Assignee: Andrey Spiridonov Reporter: Andrey Spiridonov State: In Progress Fix Version: N/A Attachments: N/A Development PRs: Available

Linked PRs from JIRA

https://github.com/h2oai/h2o-3/pull/4021