tyagi-iiitv / PointPillars

GNU General Public License v3.0
105 stars 47 forks source link

Trained model, getting Tensor mismatch error in prediction #13

Closed mihsamusev closed 3 years ago

mihsamusev commented 3 years ago

Hi, thanks for terrific work here and on Medium. I have pulled the code dated from this commit https://github.com/tyagi-iiitv/PointPillars/commit/0e2784c24503a03c705ff773cd673229a4227d27 and then trained a model for 2 days on KITTI dataset without any issues. As an output i obtained ./logs/model.h. Then i saw that the point_pillars_prediction.py was inclomplete and pulled latest commit https://github.com/tyagi-iiitv/PointPillars/commit/0e2784c24503a03c705ff773cd673229a4227d27

Running a code prediction code gave me this:

Traceback (most recent call last):
  File "point_pillars_prediction.py", line 21, in <module>
    pillar_net.load_weights(os.path.join(MODEL_ROOT, "model.h5"))
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/engine/training.py", line 2211, in load_weights
    hdf5_format.load_weights_from_hdf5_group(f, self.layers)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/saving/hdf5_format.py", line 708, in load_weights_from_hdf5_group
    K.batch_set_value(weight_value_tuples)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/util/dispatch.py", line 201, in wrapper
    return target(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/backend.py", line 3576, in batch_set_value
    x.assign(np.asarray(value, dtype=dtype(x)))
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/ops/resource_variable_ops.py", line 858, in assign
    self._shape.assert_is_compatible_with(value_tensor.shape)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/tensor_shape.py", line 1134, in assert_is_compatible_with
    raise ValueError("Shapes %s and %s are incompatible" % (self, other))
ValueError: Shapes (1, 1, 384, 20) and (16, 384, 1, 1) are incompatible

My environment is a scrin/dev-spconv Docker container with CUDA 10.1 and latest tensorflow / pytorch preinstalled: nvcc --version gives Cuda compilation tools, release 10.1, V10.1.243 pip list | grep tensor gives

tensorboard            2.3.0
tensorboard-plugin-wit 1.7.0
tensorboardX           2.1
tensorflow             2.3.0
tensorflow-estimator   2.3.0
tyagi-iiitv commented 3 years ago

I'm not sure what's wrong here, never seen this one before. Do you mind dropping an email to anjul.ten@gmail.com? We can maybe figure this out over a quick call sometime. Thanks!

nschein commented 3 years ago

If you trained it prior to my pull request, remove the background class from the config file and change all corresponding class indices back to the previous version. You don't need the background class except for debugging purposes. I think, this might be the cause for this.

tyagi-iiitv commented 3 years ago

Closing this one for now.