h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.85k stars 2k forks source link

Suspicious Assertion error when running R test in a client mode on a 2 node cluster #9425

Open exalate-issue-sync[bot] opened 1 year ago

exalate-issue-sync[bot] commented 1 year ago

Happens on H2O 3.22.0.1 and later:

{code} git checkout jenkins-3.22.0.1 ./gradlew clean build -x test cd h2o-r/tests ../../scripts/run.py --wipeall --client --numnodes 2 --jvm.xmx 4g --test runit_automl_binomial_leaderboard_higgs.R {code}

This test will fail with AssertionError:

{code} DistributedException from /172.16.2.109:40000: ' Attempting to block on task (class water.Lockable$PriorWriteLock) with equal or lower priority. Can lead to deadlock! 120 <= 120', caused by java.lang.AssertionError: Attempting to block on task (class water.Lockable$PriorWriteLock) with equal or lower priority. Can lead to deadlock! 120 <= 120 at water.RPC.result(RPC.java:241) at water.RPC.get(RPC.java:257) at water.Atomic.invoke(Atomic.java:32) at ai.h2o.automl.Leaderboard.addModels(Leaderboard.java:318) at ai.h2o.automl.Leaderboard.addModel(Leaderboard.java:365) at ai.h2o.automl.AutoML.addModel(AutoML.java:1544) at ai.h2o.automl.AutoML.pollAndUpdateProgress(AutoML.java:598) at ai.h2o.automl.AutoML.pollAndUpdateProgress(AutoML.java:521) at ai.h2o.automl.AutoML.defaultRandomForest(AutoML.java:994) at ai.h2o.automl.AutoML.learn(AutoML.java:1256) at ai.h2o.automl.AutoML.run(AutoML.java:478) at ai.h2o.automl.H2OJob$1.compute2(H2OJob.java:32) at water.H2O$H2OCountedCompleter.compute(H2O.java:1310) at jsr166y.CountedCompleter.exec(CountedCompleter.java:468) at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263) at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974) at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477) at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104) Caused by: java.lang.AssertionError: *** Attempting to block on task (class water.Lockable$PriorWriteLock) with equal or lower priority. Can lead to deadlock! 120 <= 120 at water.RPC.get(RPC.java:252) at water.Atomic.invoke(Atomic.java:32) at water.Lockable.delete(Lockable.java:102) at water.Lockable.delete(Lockable.java:93) at ai.h2o.automl.Leaderboard$1.atomic(Leaderboard.java:264) at ai.h2o.automl.Leaderboard$1.atomic(Leaderboard.java:228) at water.TAtomic.atomic(TAtomic.java:17) at water.Atomic.compute2(Atomic.java:56) at water.H2O$H2OCountedCompleter.compute1(H2O.java:1313) at ai.h2o.automl.Leaderboard$1$Icer.compute1(Leaderboard$1$Icer.java) at water.H2O$H2OCountedCompleter.compute(H2O.java:1309) ... 5 more {code}

exalate-issue-sync[bot] commented 1 year ago

Michal Kurka commented: We should go back in history and see where the issue originates.

h2o-ops commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-6194 Assignee: New H2O Bugs Reporter: Michal Kurka State: Open Fix Version: N/A Attachments: N/A Development PRs: N/A