open-mmlab / mmdetection

OpenMMLab Detection Toolbox and Benchmark
https://mmdetection.readthedocs.io
Apache License 2.0
29.38k stars 9.42k forks source link

After 2 epoch rounds of training, I encountered the problem of input picture size #6280

Closed Riser6 closed 3 years ago

Riser6 commented 3 years ago

Dear mmdet team: Thanks for your wanderful work and huge contribution to the computer vision community! Recently, when I use mmdetection to train my own new model on VOC dataset. After one or two epoch rounds of training, I encountered the problem of input picture size. According to the setting of the data pipeline, all images will be processed to 640640, so when the executing the network forward here, the scale of the feature graph is 2020, but here it is 17*17, so there is an error.


2021-10-14 04:14:44,091 - mmdet - INFO - Exp name: yolox_m_mobilevit_s_voc.py 2021-10-14 04:14:44,092 - mmdet - INFO - Epoch(val) [2][4952] AP50: 0.0010, mAP: 0.0015 Traceback (most recent call last): File "/data/wd/anaconda3/envs/openmmlab/lib/python3.7/site-packages/einops/einops.py", line 382, in reduce return recipe.apply(tensor) File "/data/wd/anaconda3/envs/openmmlab/lib/python3.7/site-packages/einops/einops.py", line 205, in apply backend.shape(tensor)) File "/data/wd/anaconda3/envs/openmmlab/lib/python3.7/site-packages/einops/einops.py", line 176, in reconstruct_from_shape length, known_product)) einops.EinopsError: Shape mismatch, can't divide axis of length 17 in chunks of 2

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "tools/train.py", line 189, in main() File "tools/train.py", line 185, in main meta=meta) File "/data/wd/mmdetection/mmdet/apis/train.py", line 174, in train_detector runner.run(data_loaders, cfg.workflow) File "/data/wd/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run epoch_runner(data_loaders[i], kwargs) File "/data/wd/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train self.run_iter(data_batch, train_mode=True, kwargs) File "/data/wd/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter kwargs) File "/data/wd/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py", line 67, in train_step return self.module.train_step(inputs[0], kwargs[0]) File "/data/wd/mmdetection/mmdet/models/detectors/base.py", line 238, in train_step losses = self(data) File "/data/wd/anaconda3/envs/openmmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(input, kwargs) File "/data/wd/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 98, in new_func return old_func(args, kwargs) File "/data/wd/mmdetection/mmdet/models/detectors/base.py", line 172, in forward return self.forward_train(img, img_metas, kwargs) File "/data/wd/mmdetection/mmdet/models/detectors/single_stage.py", line 92, in forward_train x = self.extract_feat(img) File "/data/wd/mmdetection/mmdet/models/detectors/single_stage.py", line 53, in extract_feat x = self.backbone(img) File "/data/wd/anaconda3/envs/openmmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(input, *kwargs) File "/data/wd/mmdetection/mmdet/models/backbones/mobilevit.py", line 220, in forward x = self.mvit2 File "/data/wd/anaconda3/envs/openmmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(input, kwargs) File "/data/wd/mmdetection/mmdet/models/backbones/mobilevit.py", line 156, in forward x = rearrange(x, 'b d (h ph) (w pw) -> b (ph pw) (h w) d', ph=self.ph, pw=self.pw) File "/data/wd/anaconda3/envs/openmmlab/lib/python3.7/site-packages/einops/einops.py", line 452, in rearrange return reduce(tensor, pattern, reduction='rearrange', axes_lengths) File "/data/wd/anaconda3/envs/openmmlab/lib/python3.7/site-packages/einops/einops.py", line 390, in reduce raise EinopsError(message + '\n {}'.format(e)) einops.EinopsError: Error while processing rearrange-reduction pattern "b d (h ph) (w pw) -> b (ph pw) (h w) d". Input tensor shape: torch.Size([3, 240, 17, 17]). Additional info: {'ph': 2, 'pw': 2}. Shape mismatch, can't divide axis of length 17 in chunks of 2

Riser6 commented 3 years ago

Hope friends encountering similar situation could help me, thanks!

RangiLyu commented 3 years ago

Would you like to upload your config file?

Riser6 commented 3 years ago

configs.zip

Riser6 commented 3 years ago

Thanks for your reply!

RangiLyu commented 3 years ago

When using YOLOX augmentation, the SyncRandomSizeHook set in the config will randomly resize the image to 14x32 ~ 26x32 every epoch. So the input size is not 640. You can try to delete this hook to using a fix input size.

Riser6 commented 3 years ago

Okay! thanks for your valuable guidance, I will modify it!