xingyizhou / CenterNet

Object detection, 3D detection, and pose estimation using center point detection:
MIT License
7.27k stars 1.92k forks source link

got error by using different resolution image #363

Open ruinianxu opened 5 years ago

ruinianxu commented 5 years ago

Hi all, I used resolution (227, 227, 3) image to train pose_dla_dcn but got error that Traceback (most recent call last): File "main.py", line 102, in <module> main(opt) File "main.py", line 70, in main log_dict_train, _ = trainer.train(epoch, train_loader) File "/home/ruinianxu/IVA_Lab/Project/GraspKpNet/src/lib/trains/base_trainer.py", line 119, in train return self.run_epoch('train', epoch, data_loader) File "/home/ruinianxu/IVA_Lab/Project/GraspKpNet/src/lib/trains/base_trainer.py", line 69, in run_epoch output, loss, loss_stats = model_with_loss(batch) File "/home/ruinianxu/miniconda2/envs/CenterNet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__ result = self.forward(*input, **kwargs) File "/home/ruinianxu/IVA_Lab/Project/GraspKpNet/src/lib/trains/base_trainer.py", line 19, in forward outputs = self.model(batch['input']) File "/home/ruinianxu/miniconda2/envs/CenterNet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__ result = self.forward(*input, **kwargs) File "/home/ruinianxu/IVA_Lab/Project/GraspKpNet/src/lib/models/networks/pose_dla_dcn.py", line 494, in forward x = self.base(x) File "/home/ruinianxu/miniconda2/envs/CenterNet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__ result = self.forward(*input, **kwargs) File "/home/ruinianxu/IVA_Lab/Project/GraspKpNet/src/lib/models/networks/pose_dla_dcn.py", line 307, in forward x = getattr(self, 'level{}'.format(i))(x) File "/home/ruinianxu/miniconda2/envs/CenterNet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__ result = self.forward(*input, **kwargs) File "/home/ruinianxu/IVA_Lab/Project/GraspKpNet/src/lib/models/networks/pose_dla_dcn.py", line 231, in forward x1 = self.tree1(x, residual) File "/home/ruinianxu/miniconda2/envs/CenterNet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__ result = self.forward(*input, **kwargs) File "/home/ruinianxu/IVA_Lab/Project/GraspKpNet/src/lib/models/networks/pose_dla_dcn.py", line 231, in forward x1 = self.tree1(x, residual) File "/home/ruinianxu/miniconda2/envs/CenterNet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__ result = self.forward(*input, **kwargs) File "/home/ruinianxu/IVA_Lab/Project/GraspKpNet/src/lib/models/networks/pose_dla_dcn.py", line 73, in forward out += residual RuntimeError: The expanded size of the tensor (29) must match the existing size (28) at non-singleton dimension 3 Does the resolution have to be even? Thanks in advance for any suggestions.

evitself commented 5 years ago

@ruinianxu This seems to be tensor shape mismatch issue, have you changed --input_res when you launch training? Please see dla implementation, it uses up to c5 feature map, that is stride 32, so you need to set your input resolution to be divisible by 32. For example, input_res can be set to 224 or 256 in your case. But I am wondering you might need to take a look at the dataset/sample/ctdet.py, see if default data augment strategy can be well fit in your case.

ruinianxu commented 5 years ago

@evitself I have solved the problem by resizing image to 256. Thank you for your suggestion. For data augmentation, I have a question. When cropping image, function _get_border() takes 128 and image shape as input. In my case, should I change 128 to 64?

evitself commented 5 years ago

@ruinianxu Yep, I think it's worth a try. _get_border() actually controls random cropped center point in following affine transformation. The strategy of randomly cropping a patch from original image usually have large impact on model perf for one stage detectors. I suggest you do a set of experiments to verify this setting.

nobody-cheng commented 5 years ago

@ruinianxu change 64, it work??

Veronica1997 commented 4 years ago

@ruinianxu This seems to be tensor shape mismatch issue, have you changed --input_res when you launch training? Please see dla implementation, it uses up to c5 feature map, that is stride 32, so you need to set your input resolution to be divisible by 32. For example, input_res can be set to 224 or 256 in your case. But I am wondering you might need to take a look at the dataset/sample/ctdet.py, see if default data augment strategy can be well fit in your case.

Thanks very much for solving my problem!