ivan-alles / localizer

An object detection python library predicting object coordinates and orientation angles on 2d images.
https://ivan-alles.github.io/localizer/
MIT License
50 stars 12 forks source link

Predict trained model on MacOS #9

Open cansik opened 2 years ago

cansik commented 2 years ago

Thank you for the great work. I just tried out the localizer with the hands_on_demo.py. Capturing the dataset and training seems to work, but during the prediction it seems to have problems (maybe related to #3). I am using the tensorflow metal package and running it on an M1. And yes, I am using a standard RGB webcam.

Would be great if you could give me a hint what could be the problem here:

Backend MacOSX is interactive backend. Turning interactive mode on.
Metal device set to: Apple M1 Max
systemMemory: 64.00 GB
maxCacheSize: 24.00 GB
2022-08-26 12:14:50.225504: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2022-08-26 12:14:50.225657: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 0 MB memory) -> physical PluggableDevice (device: 0, name: METAL, pci bus id: <undefined>)
WARNING:tensorflow:No training configuration found in save file, so the model was *not* compiled. Compile it manually.
2022-08-26 12:15:35.760479: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
2022-08-26 12:15:35.892807: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
1/1 [==============================] - 0s 206ms/step
Traceback (most recent call last):
  File "tmp/localizer/localizer/hands_on_demo.py", line 111, in _detect
    objects = self._localizer.predict(input)
  File "tmp/localizer/localizer/predict.py", line 250, in predict
    self._update_input_image_parameters(image.shape)
  File "tmp/localizer/localizer/predict.py", line 109, in _update_input_image_parameters
    self._create_model()
  File "tmp/localizer/localizer/predict.py", line 212, in _create_model
    yx = tf.gather_nd(average_pos, local_max_idx) + tf.cast(local_max_idx[:, 2:], np.float32)
  File "tmp/localizer/venv/lib/python3.9/site-packages/tensorflow/python/util/traceback_utils.py", line 153, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "tmp/localizer/venv/lib/python3.9/site-packages/keras/layers/core/tf_op_layer.py", line 107, in handle
    return TFOpLambda(op)(*args, **kwargs)
  File "tmp/localizer/venv/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "<string>", line 3, in raise_from
TypeError: Dimension value must be integer or None or have an __index__ method, got value '<attribute 'shape' of 'numpy.generic' objects>' with type '<class 'getset_descriptor'>'
ivan-alles commented 2 years ago

Looks like the dimension of the input is not an integer. Is input int self._localizer.predict(input) a numpy array (it should be)?

cansik commented 2 years ago

Yes it is a numpy array, but as float32 (shape = (297, 528, 3)). I did not change anything in the code, but it seems that your self._make_input already changes it to float32.

To be honest, I do not really understand what is going on at that part of the code: predict.py#L206-L216 It seems that you change the model to include more outputs. Could you explain me please why you are recreating the model so I am able to investigate a bit further?

I now split up the call here to see where the error really happens:

cast = tf.cast(local_max_idx[:, 2:], np.float32)
gather = tf.gather_nd(average_pos, local_max_idx)
yx = gather + cast

The tf.cast is throwing the exception. local_max_idx[:, 2:] has a shape of (None, 2) which seems to be integer or None.

Maybe it could be a dependency problem because you were using numpy 1.19.5 and I am on 1.21.2.

Update: It seems that it is a tensorflow-version problem: https://github.com/keras-team/keras/issues/15536

ivan-alles commented 2 years ago

The model for prediction differs from the model for training, but I cannot explain this in just a couple of words.

JakesMD commented 1 year ago

Hi! Any update / fix? I've got exactly the same issue on my Raspberry Pi.