open-mmlab / mmdetection3d

OpenMMLab's next-generation platform for general 3D object detection.
https://mmdetection3d.readthedocs.io/en/latest/
Apache License 2.0
5.18k stars 1.52k forks source link

Bug in BaseInstance3DBoxes #277

Closed filaPro closed 3 years ago

filaPro commented 3 years ago

In my opinion the inplace modification of tensor in this line is dangerous for 2 reasons.

  1. It may tend to corrupt ground truth data in numpy format. It's hardly believable, but i'm going to explain. The thing is, torch.as_tensor function in this line doesn't protect the tensor argument of __init__ from further rewriting of the same memory, allocated by numpy. More info can be found in the official documentation of torch.as_tensor. For example, this constructor is called here in indoor_eval function. So after each validation step the z coordinate of gt_anno['gt_boxes_upright_depth'] may be overwritten. One may ask why it doesn't affect current evaluation protocol for SUNRGBD? The answer is, by coincidence these coordinates are now stored in np.float64. The bug would be present if that is np.float32.

  2. In many cases pytorch doesn't like inplace modifications of variables under the gradient. It tends to RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: ...

If I'm not mistaken something and you agree this is a bug, I can send a PR. The possible solution may be:

self.tensor = torch.cat((self.tensor[:, :3] + self.tensor[:, 3:6] * (dst - src), self.tensor[:, 3:]), dim=-1)
Tai-Wang commented 3 years ago

Thanks for your kind suggestion. I almost agree with you, but there may be only two concerns:

  1. I am not sure whether this inplace modification has affect some current implementation. So it should be able to pass the unit tests and also ensure the other implemented models will not be affected (for example, maybe need rebenchmark).
  2. For your given possible solution, I think .clone() should be added after self.tensor on the right side of the equation to avoid your mentioned RuntimError caused by inplace modifications?
filaPro commented 3 years ago

I'm now running without extra .clone(). Do we really need it? The proposed replacement of += to torch.cat already avoids this RuntimeError.

By the way, if you are going to replace += here, it also can be fixed in the translate method of this class.

ZwwWayne commented 3 years ago

Hi @filaPro , Thanks for you kind suggestion. Would you like to create a PR to fix that?

Tai-Wang commented 3 years ago

OK @filaPro . I just think .clone() is a general way to avoid the inplace RuntimeError and could protect the original tensor from being unintentionally changed. If without .clone() could work on your side, you can feel free to create a PR to fix that.

filaPro commented 3 years ago

Hi @Tai-Wang , I agree that .clone() is a better solution then replacing all inplace operations in all methods of BaseInstance3DBoxes and its descendants. So the PR is as simple as adding this .clone().