Closed BenUze closed 7 months ago
@BenUze : thank you for bring this up. I ran into error too. Will resolve.
@maurever
Just ran the pyunit_cancer_rulefit.py and ran into the following error:
@wendycwong, your error does not correspond with this bug report; see https://github.com/h2oai/h2o-3/pull/15974.
The pyunit_cancer_rulefit.py does not reproduce the error the @BenUze mentioned.
@BenUze, could you please share the working code on a sample dataset, if possible?
It looks like the Rulefit model was deleted before finishing the model training... However, I am not able to reproduce the error...
Hi,
Sorry for this late answer, I've been trying to solve the issue on my own. It appears as @maurever said that the model is deleted before training. During troubleshooting, I noticed that the created model disappears from H2O Flow during the process. I have tried to reproduce the error with the example given in H2O-3 example for Rulefit with the titanic dataset but it worked as intended. As you will see in the uploaded files, I am using optuna for hyperparameter optimization and it didn't cause any issue with the titanic dataset.
Thank you again for your help
Hi,
While trying to get RuleFit to work again, I've come across a new error message. The code is the same but I started H2O from Unix shell with 32 Gigs dedicated to the JVM.
Error message : "[W 2024-01-31 16:46:10,264] Trial 0 failed with parameters: {'algorithm': 'drf', 'max_num_rules': 7, 'max_rule_length': 1, 'model_type': 'rules', 'min_rule_length': 1} because of the following error: OSError('Job with key $03010a0a1d039c05ffffffff$_81a9d2850b608371db141fd9fd67433c failed with an exception: java.lang.NullPointerException\nstacktrace: \njava.lang.NullPointerException\n\tat water.Lockable$Unlock.atomic(Lockable.java:231)\n\tat water.Lockable$Unlock.atomic(Lockable.java:216)\n\tat water.TAtomic.atomic(TAtomic.java:18)\n\tat water.Atomic.compute2(Atomic.java:56)\n\tat water.Atomic.fork(Atomic.java:39)\n\tat water.Atomic.invoke(Atomic.java:31)\n\tat water.Lockable.unlock(Lockable.java:210)\n\tat water.Lockable.unlock(Lockable.java:205)\n\tat hex.rulefit.RuleFit$RuleFitDriver.computeImpl(RuleFit.java:272)\n\tat hex.ModelBuilder$Driver.compute2(ModelBuilder.java:253)\n\tat water.H2O$H2OCountedCompleter.compute(H2O.java:1689)\n\tat jsr166y.CountedCompleter.exec(CountedCompleter.java:468)\n\tat jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)\n\tat jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:976)\n\tat jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1479)\n\tat jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)\n'). Traceback (most recent call last): File "/home/radars/mambaforge/envs/mlenv_new/lib/python3.11/site-packages/optuna/study/_optimize.py", line 200, in _run_trial value_or_values = func(trial) ^^^^^^^^^^^ File "/tmp/ipykernel_46934/2371827781.py", line 58, in objective rfit.train(training_frame = train, File "/home/radars/mambaforge/envs/mlenv_new/lib/python3.11/site-packages/h2o/estimators/estimator_base.py", line 107, in train self._train(parms, verbose=verbose) File "/home/radars/mambaforge/envs/mlenv_new/lib/python3.11/site-packages/h2o/estimators/estimator_base.py", line 199, in _train job.poll(poll_updates=self._print_model_scoring_history if verbose else None) File "/home/radars/mambaforge/envs/mlenv_new/lib/python3.11/site-packages/h2o/job.py", line 88, in poll raise EnvironmentError("Job with key {} failed with an exception: {}\nstacktrace: " OSError: Job with key $03010a0a1d039c05ffffffff$_81a9d2850b608371db141fd9fd67433c failed with an exception: java.lang.NullPointerException stacktrace: java.lang.NullPointerException at water.Lockable$Unlock.atomic(Lockable.java:231) at water.Lockable$Unlock.atomic(Lockable.java:216) at water.TAtomic.atomic(TAtomic.java:18) at water.Atomic.compute2(Atomic.java:56) at water.Atomic.fork(Atomic.java:39) at water.Atomic.invoke(Atomic.java:31) at water.Lockable.unlock(Lockable.java:210) at water.Lockable.unlock(Lockable.java:205) at hex.rulefit.RuleFit$RuleFitDriver.computeImpl(RuleFit.java:272) at hex.ModelBuilder$Driver.compute2(ModelBuilder.java:253) at water.H2O$H2OCountedCompleter.compute(H2O.java:1689) at jsr166y.CountedCompleter.exec(CountedCompleter.java:468) at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263) at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:976) at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1479) at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)"
Hi,
I have found the root cause of my issue. I was using 'enum' data type for ordinal encoded strings variables. After casting these values to integers, everything is working properly again.
However, there is the issue that this problem appeared after I had already used the model with success on the same dataset with identical preprocessing.
I'm closing the issue.
Versions H2O version 3.44.0.2 OS : UBUNTU 20.04.6 LTS Python version : 3.11 Java Version: openjdk version "11.0.21" 2023-10-17; OpenJDK Runtime Environment (build 11.0.21+9-post-Ubuntu-0ubuntu120.04); OpenJDK 64-Bit Server VM (build 11.0.21+9-post-Ubuntu-0ubuntu120.04, mixed mode, sharing)
Actual behavior In jupyter notebook (python 3.11), rulefit Model Build ends with "OSError('Job with key $03017f00000132d4ffffffff$_ad3f38a6ff226bfeeae60fe6934c4730 failed with an exception: java.lang.AssertionError: Trying to unlock null! (key = rfit1)" Models are built as part of an optimization of hyperparameter with optuna. It worked before and I can't think of any change made to the OS, my python environment that would generate such issue
Expected behavior I expect the model training to complete normally, as it did before
Steps to reproduce Steps to reproduce the behavior (with working code on a sample dataset, if possible):
Error message OSError('Job with key $03017f00000132d4ffffffff$_ad3f38a6ff226bfeeae60fe6934c4730 failed with an exception: java.lang.AssertionError: Trying to unlock null! (key = rfit1) stacktrace: java.lang.AssertionError: Trying to unlock null! (key = rfit1) at water.Lockable$Unlock.atomic(Lockable.java:225) at water.Lockable$Unlock.atomic(Lockable.java:216) at water.TAtomic.atomic(TAtomic.java:18) at water.Atomic.compute2(Atomic.java:56) at water.Atomic.fork(Atomic.java:39) at water.Atomic.invoke(Atomic.java:31) at water.Lockable.unlock(Lockable.java:210) at water.Lockable.unlock(Lockable.java:205) at hex.rulefit.RuleFit$RuleFitDriver.computeImpl(RuleFit.java:272) at hex.ModelBuilder$Driver.compute2(ModelBuilder.java:253) at water.H2O$H2OCountedCompleter.compute(H2O.java:1689) at jsr166y.CountedCompleter.exec(CountedCompleter.java:468) at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263) at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:976) at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1479) at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104) '). Traceback (most recent call last): File "/home//mambaforge/envs/mlenv_new/lib/python3.11/site-packages/optuna/study/_optimize.py", line 200, in _run_trial
value_or_values = func(trial)
^^^^^^^^^^^
File "/tmp/ipykernel_56818/2565691037.py", line 59, in objective
rfit.train(training_frame = train,
File "/home//mambaforge/envs/mlenv_new/lib/python3.11/site-packages/h2o/estimators/estimator_base.py", line 107, in train
self._train(parms, verbose=verbose)
File "/home//mambaforge/envs/mlenv_new/lib/python3.11/site-packages/h2o/estimators/estimator_base.py", line 199, in _train
job.poll(poll_updates=self._print_model_scoring_history if verbose else None)
File "/home//mambaforge/envs/mlenv_new/lib/python3.11/site-packages/h2o/job.py", line 88, in poll
raise EnvironmentError("Job with key {} failed with an exception: {}
stacktrace: "
OSError: Job with key $03017f00000132d4ffffffff$_ad3f38a6ff226bfeeae60fe6934c4730 failed with an exception: java.lang.AssertionError: Trying to unlock null! (key = rfit1)
stacktrace:
java.lang.AssertionError: Trying to unlock null! (key = rfit1)
at water.Lockable$Unlock.atomic(Lockable.java:225)
at water.Lockable$Unlock.atomic(Lockable.java:216)
at water.TAtomic.atomic(TAtomic.java:18)
at water.Atomic.compute2(Atomic.java:56)
at water.Atomic.fork(Atomic.java:39)
at water.Atomic.invoke(Atomic.java:31)
at water.Lockable.unlock(Lockable.java:210)
at water.Lockable.unlock(Lockable.java:205)
at hex.rulefit.RuleFit$RuleFitDriver.computeImpl(RuleFit.java:272)
at hex.ModelBuilder$Driver.compute2(ModelBuilder.java:253)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1689)
at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:976)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1479)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
Thank you for any help