Bit of a weird one that I'm hoping someone may have encountered before and can give some direction. I'm getting variable errors during EvalCallback.on_epoch_end. These errors change between runs (see examples below) and seem to relate to data inconsistencies. If I step through the code in debug mode there is no problem and it seems to work fine in following epochs. If I include time.sleep(1) at the start of the callback execution then no errors are thrown.
My best guess is that some data used by the callback has not been fully initialized when the first call to EvalCallback.on_epoch_end is made. However I'm not sure if this is an issue due to something happening in the underlying tensorflow/keras level, or if the issue is arising from the tfsim level.
Error Examples
Epoch 1/800
62/62 [==============================] - ETA: 0s - loss: 332.3957 - proj_std: 0.0441Traceback (most recent call last):
File "C:\Users\chris\Documents\XXXXX\Projects\PythonScratch\tfsim_contrastive_model\train_synthetic.py", line 194, in <module>
main()
File "C:\Users\chris\Documents\XXXXX\Projects\PythonScratch\tfsim_contrastive_model\train_synthetic.py", line 178, in main
history = contrastive_model.fit(
File "C:\Users\chris\anaconda3\envs\PythonScratchTF_Test\lib\site-packages\keras\utils\traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "C:\Users\chris\anaconda3\envs\PythonScratchTF_Test\lib\site-packages\tensorflow_similarity\callbacks.py", line 188, in on_epoch_end
known_results = _compute_metrics(
File "C:\Users\chris\anaconda3\envs\PythonScratchTF_Test\lib\site-packages\tensorflow_similarity\callbacks.py", line 291, in _compute_metrics
classification_results = evaluator.evaluate_classification(
File "C:\Users\chris\anaconda3\envs\PythonScratchTF_Test\lib\site-packages\tensorflow_similarity\evaluators\memory_evaluator.py", line 152, in evaluate_classification
matcher.compute_count(
File "C:\Users\chris\anaconda3\envs\PythonScratchTF_Test\lib\site-packages\tensorflow_similarity\matchers\classification_match.py", line 177, in compute_count
match_mask, distance_mask = self._compute_match_indicators(
File "C:\Users\chris\anaconda3\envs\PythonScratchTF_Test\lib\site-packages\tensorflow_similarity\matchers\classification_match.py", line 130, in _compute_match_indicators
d_labels, d_dist = self.derive_match(lookup_labels, lookup_distances)
File "C:\Users\chris\anaconda3\envs\PythonScratchTF_Test\lib\site-packages\tensorflow_similarity\matchers\match_nearest.py", line 55, in derive_match
return lookup_labels[:, :1], lookup_distances[:, :1]
tensorflow.python.framework.errors_impl.InvalidArgumentError: {{function_node __wrapped__StridedSlice_device_/job:localhost/replica:0/task:0/device:GPU:0}} Index out of range using input dim 1; input has only 1 dims [Op:StridedSlice] name: strided_slice/
Epoch 1/800
62/62 [==============================] - ETA: 0s - loss: 331.3144 - proj_std: 0.0441Traceback (most recent call last):
File "C:\Users\chris\Documents\XXXXX\Projects\PythonScratch\tfsim_contrastive_model\train_synthetic.py", line 194, in <module>
main()
File "C:\Users\chris\Documents\XXXXX\Projects\PythonScratch\tfsim_contrastive_model\train_synthetic.py", line 178, in main
history = contrastive_model.fit(
File "C:\Users\chris\anaconda3\envs\PythonScratchTF_Test\lib\site-packages\keras\utils\traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "C:\Users\chris\anaconda3\envs\PythonScratchTF_Test\lib\site-packages\tensorflow_similarity\callbacks.py", line 186, in on_epoch_end
self.model.index(self.targets, self.target_labels, verbose=0)
File "C:\Users\chris\anaconda3\envs\PythonScratchTF_Test\lib\site-packages\tensorflow_similarity\models\contrastive_model.py", line 558, in index
predictions = self.predict(x)
File "C:\Users\chris\anaconda3\envs\PythonScratchTF_Test\lib\site-packages\tensorflow_similarity\models\contrastive_model.py", line 457, in predict
x = self.backbone.predict(
ValueError: can only convert an array of size 1 to a Python scalar
Epoch 1/800
62/62 [==============================] - ETA: 0s - loss: 329.6670 - proj_std: 0.0441Traceback (most recent call last):
File "C:\Users\chris\Documents\XXXXX\Projects\PythonScratch\tfsim_contrastive_model\train_synthetic.py", line 194, in <module>
main()
File "C:\Users\chris\Documents\XXXXX\Projects\PythonScratch\tfsim_contrastive_model\train_synthetic.py", line 178, in main
history = contrastive_model.fit(
File "C:\Users\chris\anaconda3\envs\PythonScratchTF_Test\lib\site-packages\keras\utils\traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "C:\Users\chris\anaconda3\envs\PythonScratchTF_Test\lib\site-packages\tensorflow_similarity\callbacks.py", line 188, in on_epoch_end
known_results = _compute_metrics(
File "C:\Users\chris\anaconda3\envs\PythonScratchTF_Test\lib\site-packages\tensorflow_similarity\callbacks.py", line 291, in _compute_metrics
classification_results = evaluator.evaluate_classification(
File "C:\Users\chris\anaconda3\envs\PythonScratchTF_Test\lib\site-packages\tensorflow_similarity\evaluators\memory_evaluator.py", line 152, in evaluate_classification
matcher.compute_count(
File "C:\Users\chris\anaconda3\envs\PythonScratchTF_Test\lib\site-packages\tensorflow_similarity\matchers\classification_match.py", line 177, in compute_count
match_mask, distance_mask = self._compute_match_indicators(
File "C:\Users\chris\anaconda3\envs\PythonScratchTF_Test\lib\site-packages\tensorflow_similarity\matchers\classification_match.py", line 128, in _compute_match_indicators
ClassificationMatch._check_shape(query_labels, lookup_labels, lookup_distances)
File "C:\Users\chris\anaconda3\envs\PythonScratchTF_Test\lib\site-packages\tensorflow_similarity\matchers\classification_match.py", line 305, in _check_shape
raise ValueError("Number of query labels must match the number of " "lookup_label sets.")
ValueError: Number of query labels must match the number of lookup_label sets.
I'm working pretty close to the unsupervised-learning example notebook with the following key exceptions:
custom dataset with input size (None, 64, 64, 1)
The backbone is the same from the supervised learning notebook.
Thanks @shinstra, I'll try and take a look into this. The lookup error may be caused by something in the result set returned by nmslib, but I'll have to dig into the other errors to find out more.
Bit of a weird one that I'm hoping someone may have encountered before and can give some direction. I'm getting variable errors during
EvalCallback.on_epoch_end
. These errors change between runs (see examples below) and seem to relate to data inconsistencies. If I step through the code in debug mode there is no problem and it seems to work fine in following epochs. If I includetime.sleep(1)
at the start of the callback execution then no errors are thrown.My best guess is that some data used by the callback has not been fully initialized when the first call to
EvalCallback.on_epoch_end
is made. However I'm not sure if this is an issue due to something happening in the underlying tensorflow/keras level, or if the issue is arising from the tfsim level.Error Examples
I'm working pretty close to the unsupervised-learning example notebook with the following key exceptions:
I'm using python==3.8.16 and tensorflow==2.10.1