RuntimeError: shape '[1, 1, 1024, 1024, 3]' is invalid for input of size 9437184

ttt0666 commented 1 year ago

backforward error..what should I do?

  File "/root/miniconda3/envs/mmdetection_alignps/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/root/miniconda3/envs/mmdetection_alignps/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 51, in train
    self.call_hook('after_train_iter')
  File "/root/miniconda3/envs/mmdetection_alignps/lib/python3.7/site-packages/mmcv/runner/base_runner.py", line 307, in call_hook
    getattr(hook, fn_name)(self)
  File "/root/miniconda3/envs/mmdetection_alignps/lib/python3.7/site-packages/mmcv/runner/hooks/optimizer.py", line 36, in after_train_iter
    runner.outputs['loss'].backward()
  File "/root/miniconda3/envs/mmdetection_alignps/lib/python3.7/site-packages/torch/tensor.py", line 221, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/root/miniconda3/envs/mmdetection_alignps/lib/python3.7/site-packages/torch/autograd/__init__.py", line 132, in backward
    allow_unreachable=True)  # allow_unreachable flag
  File "/root/miniconda3/envs/mmdetection_alignps/lib/python3.7/site-packages/torch/autograd/function.py", line 89, in apply
    return self._forward_cls.backward(self, *args)  # type: ignore
  File "/root/miniconda3/envs/mmdetection_alignps/lib/python3.7/site-packages/torch/autograd/function.py", line 210, in wrapper
    outputs = fn(ctx, *args)
  File "/root/miniconda3/envs/mmdetection_alignps/lib/python3.7/site-packages/mmcv/ops/deform_conv.py", line 145, in backward
    im2col_step=cur_im2col_step)
RuntimeError: shape '[1, 1, 1024, 1024, 3]' is invalid for input of size 9437184

jiabeiwangTJU commented 1 year ago

Thanks for your interest！ Maybe you can debug to find where the wrong tensor operation is. "deform_conv" is used in both neck and roi_head/bbox_head. Please confirm the location of the error report.

ttt0666 commented 1 year ago

@jiabeiwangTJU when I set 'use_deform=False' in bbox_head, it works. So there is something wrong with deform conv.

when forward, there is nothing wrong, but when execute 'loss.backward()', this bug occurs. I still do not resolve.

Also, there is another bug in inference phrase:

Traceback (most recent call last):
  File "/DICL/mmdet/models/necks/fpn_single16_C45add.py", line 128, in forward
    new_input_32 = self.lateral_conv_32(inputs[2])
  File "/root/miniconda3/envs/mmdetection_alignps/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/root/miniconda3/envs/mmdetection_alignps/lib/python3.7/site-packages/mmcv/ops/deform_conv.py", line 364, in forward
    self.dilation, self.groups, self.deform_groups)
  File "/root/miniconda3/envs/mmdetection_alignps/lib/python3.7/site-packages/mmcv/ops/deform_conv.py", line 91, in forward
    cur_im2col_step) == 0, 'im2col step must divide batchsize'
AssertionError: im2col step must divide batchsize

due to nms_pose=300 in your config:

test_cfg = dict(
    rpn=dict(
        nms_across_levels=False,
        nms_pre=6000,
        nms_post=300,
        max_num=1000,
        nms_thr=0.7,
        min_bbox_size=16),

so the shape of inputs[2] in neck is [300,2048,14,6]..300% im2col !=0 where default im2col=32, which causes this AssertionError.

Is your value in config right or is there any other reason?

jiabeiwangTJU commented 1 year ago

Thanks again！In mmcv\ops\deform_conv.py, we change the default value of im2col_step from 32 to 512. Sorry, we didn't explain it in the instructions. We'll add this later in the README.md.

jiabeiwangTJU / DICL

RuntimeError: shape '[1, 1, 1024, 1024, 3]' is invalid for input of size 9437184 #2