Shape issue in AdaptiveConv

Prerequisite

[X] I have searched Issues and Discussions but cannot get the expected help.
[X] I have read the FAQ documentation but cannot get the expected help.
[X] The bug has not been fixed in the latest version (master) or latest version (3.x).

Task

I have modified the scripts/configs, or I'm working on my own tasks/models/datasets.

Branch

master branch https://github.com/open-mmlab/mmdetection

Environment

MMCV 3.x, MMDet 2.1.0

Reproduces the problem - code sample

_base_ = [
    'faster-rcnn_r50_fpn.py',
]

find_unused_parameters=True
rpn_weight = 0.9
model = dict(
    type='FasterRCNN',
    neck=dict(
        type='FPN',
        in_channels=[256, 512, 1024, 2048],
        out_channels=256,
        num_outs=4),
    rpn_head=dict(
        _delete_=True,
        type='CRPNHead',
        num_stages=2,
        stages=[
            dict(
                type='StageRefineRPNHead',
                in_channels=256,
                feat_channels=256,
                anchor_generator=dict(
                    type='AnchorGenerator',
                    scales=[2],
                    ratios=[1.0],
                    strides=[4, 8, 16, 32]),
                refine_reg_factor=200.0,
                refine_cfg=dict(type='dilation', dilation=3),
                refined_feature=True,
                sampling=False,
                with_cls=False,
                reg_decoded_bbox=True,
                bbox_coder=dict(
                    type='DeltaXYWHBBoxCoder',
                    target_means=(.0, .0, .0, .0),
                    target_stds=(0.1, 0.1, 0.5, 0.5)),
                loss_bbox=dict(
                    type='IoULoss', linear=True,
                    loss_weight=10.0 * rpn_weight)),
            dict(
                type='StageRefineRPNHead',
                in_channels=256,
                feat_channels=256,
                refine_cfg=dict(type='offset'),
                refined_feature=True,
                sampling=True,
                with_cls=True,
                reg_decoded_bbox=True,
                bbox_coder=dict(
                    type='DeltaXYWHBBoxCoder',
                    target_means=(.0, .0, .0, .0),
                    target_stds=(0.05, 0.05, 0.1, 0.1)),
                loss_cls=dict(
                    type='CrossEntropyLoss',
                    use_sigmoid=True,
                    loss_weight=1.0 * rpn_weight),
                loss_bbox=dict(
                    type='IoULoss', linear=True,
                    loss_weight=10.0 * rpn_weight))]),
# model training and testing settings
    train_cfg=dict(
        rpn=[
            dict(
                assigner=dict(
                    type='DynamicAssigner',
                    low_quality_iou_thr=0.2,
                    base_pos_iou_thr=0.25,
                    neg_iou_thr=0.15),
                allowed_border=-1,
                pos_weight=-1,
                debug=False),
            dict(
                assigner=dict(
                    type='MaxIoUAssigner',
                    pos_iou_thr=0.7,
                    neg_iou_thr=0.7,
                    min_pos_iou=0.3,
                    ignore_iof_thr=-1),
                sampler=dict(
                    type='RandomSampler',
                    num=256,
                    pos_fraction=0.5,
                    neg_pos_ub=-1,
                    add_gt_as_proposals=False),
                allowed_border=-1,
                pos_weight=-1,
                debug=False)
        ],
        rpn_proposal=dict(max_per_img=300, nms=dict(iou_threshold=0.8)),
        rcnn=dict(
            assigner=dict(
                pos_iou_thr=0.50, neg_iou_thr=0.50, min_pos_iou=0.50),
            sampler=dict(type='RandomSampler', num=256, pos_fraction=0.5))),
    test_cfg=dict(
        rpn=dict(max_per_img=300, nms=dict(iou_threshold=0.5)),
        rcnn=dict(score_thr=0.05))
)

Reproduces the problem - command or script

config='mmdetection/projects/ViTDet/configs/vitdet-cfinet.py' cfg = Config.fromfile(config) cfg.work_dir = osp.join('./work_dirs', osp.splitext(osp.basename(config))[0] + '-cfinet')

runner = Runner.from_cfg(cfg) runner.train()

Reproduces the problem - error message

File ~/mmdetection/mmdet/models/dense_heads/cascade_rpn_head.py:99, in AdaptiveConv.forward(self, x, offset)
     96 assert H * W == offset.shape[1]
     97 # reshape [N, NA, 18] to (N, 18, H, W)
     98 # [1, 16384, 18] -> [1, 18, 16384]
---> 99 offset = offset.permute(0, 2, 1).reshape(N, -1, H, W)
    100 offset = offset.contiguous()
    101 x = self.conv(x, offset)

RuntimeError: shape '[8, -1, 128, 128]' is invalid for input of size 294912

The offset shape is 1, 128*128, 18 as expected, but it wants to be reshaped to 8, -1, 128, 128 -- where 8 is the batch size. It works okay with a batch size of 1, but with batch size >= 1 the batch size does not seem to broadcast

Additional information

No response

shaunyuan22 / CFINet