Closed montanier closed 2 years ago
Sounds like this pull request should solve the issue. Does your problem persist if you use the version of keras-tuner from the github repo rather than from pip?
Yes it looks like this should fix the issue. Another error seems to have been introduced though:
Traceback (most recent call last):
File "tuning.py", line 66, in <module>
callbacks=[tf.keras.callbacks.EarlyStopping("val_accuracy")],
File "/home/keras-tuner/keras_tuner/engine/base_tuner.py", line 178, in search
self.on_trial_begin(trial)
File "/home/keras-tuner/keras_tuner/engine/base_tuner.py", line 240, in on_trial_begin
self._display.on_trial_begin(self.oracle.get_trial(trial.trial_id))
File "/home/keras-tuner/keras_tuner/engine/tuner_utils.py", line 109, in on_trial_begin
self.trial_number = int(trial.trial_id) + 1
ValueError: invalid literal for int() with base 10: '9f5711e7578f752031b104016b181877'
I was wondering if a test should have been introduced on #664 , what do you think ?
See #668
re: a test in #664, #664 extends #650 where a test was introduced. (And both are unrelated to your new issue).
Thanks :)
Bug description
When running a parallel search, the bayesian oracle fails once the intial points are exhausted. The error log is as follow:
We can see in the source code that the
_x_train
variable ofGaussianProcessRegressor
is initialized in thefit
, but the following call to_vectorize_trials
ends up calling thepredict
before thefit
. https://github.com/keras-team/keras-tuner/blob/a9a384ab4158edb306acbc21e2c7599f79ab8424/keras_tuner/tuners/bayesian.py#L247Reproduce the bug
Files
All files are stored in the same directory
Dockerfile:
docker-compose.yml:
tuning.py
Commands
Start the Chief:
Start the Worker 1:
Start the Worker 2:
Expected behavior
We expect to see all workers without error until the end of the optimization.
Additional context
This error is aggravated when running in TFX. The failure of a single (or multiple) workers makes the whole tuning operation fail.
Would you like to help us fix it?
I can try. What is the strategy to fix ?