h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.79k stars 1.99k forks source link

GridSearch using cross validation and parallelism fails #7225

Open exalate-issue-sync[bot] opened 1 year ago

exalate-issue-sync[bot] commented 1 year ago

When running the following code (slight alteration to the example in https://docs.h2o.ai/h2o/latest-stable/h2o-docs/grid-search.html#grid-search-in-python): with nfodsl!=0 and parallelism!=1, then nfolds is not set correctly (i.e. cross validation is not used) and the failure Cannot update - Lockable is not write-locked! is reported (see complete message below).

Code: {code:python} import h2o from h2o.estimators.gbm import H2OGradientBoostingEstimator from h2o.grid.grid_search import H2OGridSearch

h2o.init()

Import a sample binary outcome dataset into H2O

data = h2o.import_file("https://s3.amazonaws.com/erin-data/higgs/higgs_train_10k.csv") test = h2o.import_file("https://s3.amazonaws.com/erin-data/higgs/higgs_test_5k.csv")

Identify predictors and response

x = data.columns y = "response" x.remove(y)

For binary classification, response should be a factor

data[y] = data[y].asfactor() test[y] = test[y].asfactor()

Split data into train & validation

ss = data.split_frame(seed = 1) train = ss[0] valid = ss[1]

GBM hyperparameters

gbm_params1 = {'learn_rate': [0.01, 0.1], 'max_depth': [3, 5, 9], 'sample_rate': [0.8, 1.0],}

Train and validate a cartesian grid of GBMs

gbm_grid1 = H2OGridSearch(model=H2OGradientBoostingEstimator(nfolds=2), grid_id='gbm_grid1', hyper_params=gbm_params1, parallelism=2,)

gbm_grid1.train(x=x, y=y, training_frame=train, validation_frame=valid, ntrees=100, seed=1)

Get the grid results, sorted by validation AUC

gbm_gridperf1 = gbm_grid1.get_grid(sort_by='auc', decreasing=True) gbm_gridperf1

Grab the top GBM model, chosen by validation AUC

best_gbm1 = gbm_gridperf1.models[0]

Now let's evaluate the model performance on a test set

so we get an honest estimate of top model performance

best_gbm_perf1 = best_gbm1.model_performance(test)

best_gbm_perf1.auc()

0.7781778619721595

from pprint import pprint pprint(best_gbm1.params["nfolds"]) {code}

Error Hyper-parameter: learn_rate, 0.01 Hyper-parameter: max_depth, 9 Hyper-parameter: sample_rate, 0.8 failure_details: Cannot update - Lockable is not write-locked! failure_stack_traces: java.lang.AssertionError: Cannot update - Lockable is not write-locked! at water.Lockable$Update.atomic(Lockable.java:196) at water.Lockable$Update.atomic(Lockable.java:191) at water.TAtomic.atomic(TAtomic.java:17) at water.Atomic.compute2(Atomic.java:56) at water.Atomic.fork(Atomic.java:39) at water.Atomic.invoke(Atomic.java:31) at water.Lockable.update(Lockable.java:186) at water.Lockable.update(Lockable.java:183) at hex.grid.GridSearch$ModelFeeder.onBuildSuccess(GridSearch.java:218) at hex.ParallelModelBuilder$ParallelModelBuiltListener.onModelSuccess(ParallelModelBuilder.java:59) at hex.ModelBuilder$Driver.onCompletion(ModelBuilder.java:253) at jsr166y.CountedCompleter.__tryComplete(CountedCompleter.java:425) at jsr166y.CountedCompleter.tryComplete(CountedCompleter.java:383) at hex.ModelBuilder$Driver.compute2(ModelBuilder.java:246) at water.H2O$H2OCountedCompleter.compute(H2O.java:1563) at jsr166y.CountedCompleter.exec(CountedCompleter.java:468) at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263) at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974) at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477) at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)

exalate-issue-sync[bot] commented 1 year ago

Carsten Gieshoff commented: I apologize for the inconvenience of the duplicate [[PUBDEV-8434] GridSearch using cross validation and parallelism fails - JIRA (atlassian.net)|https://h2oai.atlassian.net/browse/PUBDEV-8435]

exalate-issue-sync[bot] commented 1 year ago

Carsten Gieshoff commented: The issue does no longer occur for h2o at '3.36.0.2'.

h2o-ops-ro commented 1 year ago

JIRA Issue Details

Jira Issue: PUBDEV-8435 Assignee: Michal Kurka Reporter: Carsten Gieshoff State: Open Fix Version: Backlog Attachments: N/A Development PRs: N/A