Error when training with dcn_head

Hi @tianweiy,

Thanks for your great work! I am now trying to run the train.py but get the similar error like #120 . The error is shown as follow: Traceback (most recent call last): File "tools/train.py", line 137, in <module> main() File "tools/train.py", line 132, in main logger=logger, File "/home/zwbai/Documents/CMM_Tracking/CenterPoint/det3d/torchie/apis/train.py", line 327, in train_detector trainer.run(data_loaders, cfg.workflow, cfg.total_epochs, local_rank=cfg.local_rank) File "/home/zwbai/Documents/CMM_Tracking/CenterPoint/det3d/torchie/trainer/trainer.py", line 543, in run epoch_runner(data_loaders[i], self.epoch, **kwargs) File "/home/zwbai/Documents/CMM_Tracking/CenterPoint/det3d/torchie/trainer/trainer.py", line 418, in train self.call_hook("after_train_iter") File "/home/zwbai/Documents/CMM_Tracking/CenterPoint/det3d/torchie/trainer/trainer.py", line 331, in call_hook getattr(hook, fn_name)(self) File "/home/zwbai/Documents/CMM_Tracking/CenterPoint/det3d/torchie/trainer/hooks/optimizer.py", line 18, in after_train_iter trainer.outputs["loss"].backward() File "/home/zwbai/anaconda3/envs/centerpoint/lib/python3.6/site-packages/torch/tensor.py", line 245, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/home/zwbai/anaconda3/envs/centerpoint/lib/python3.6/site-packages/torch/autograd/__init__.py", line 147, in backward allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag File "/home/zwbai/anaconda3/envs/centerpoint/lib/python3.6/site-packages/torch/autograd/function.py", line 89, in apply return self._forward_cls.backward(self, *args) # type: ignore File "/home/zwbai/anaconda3/envs/centerpoint/lib/python3.6/site-packages/torch/autograd/function.py", line 210, in wrapper outputs = fn(ctx, *args) File "/home/zwbai/Documents/CMM_Tracking/CenterPoint/det3d/ops/dcn/deform_conv.py", line 93, in backward cur_im2col_step) RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead. My environment is Pytorch 1.8.1 + CUDA 11.1 + Cudnn 8.0.5

BTW, I also changed all the .view() to .reshape() but still get the error. I find that if I set dcn_head = False or the batchsize = 1, then this error will disappear, which I don't know why. I guess this may due to the dcn code, but I don't know exactly why and how to fix it? So could you please give some suggestions?

tianweiy / CenterPoint

Error when training with dcn_head #163