RuntimeError: The size of tensor a (4) must match the size of tensor b (7) at non-singleton dimension 1

shinya7y / UniverseNet

USB: Universal-Scale Object Detection Benchmark (BMVC 2022)

Apache License 2.0

422 stars 54 forks source link

RuntimeError: The size of tensor a (4) must match the size of tensor b (7) at non-singleton dimension 1 #7

Closed Ericargus closed 4 years ago

Ericargus commented 4 years ago

we use my own dataset, the dataset has 10 classes, and i change num_classes=10 in the model config file. when i start train ,after few epoches, i met this error. File "/home/detmodel/UniverseNet/mmdet/models/losses/iou_loss.py", line 344, in forward return (pred * weight).sum() # 0 RuntimeError: The size of tensor a (4) must match the size of tensor b (7) at non-singleton dimension 1. can you give me some suggestions. Thanks a alot. ps: my dataset works well with cascade rcnn.

shinya7y commented 4 years ago

This issue seems related to https://github.com/shinya7y/UniverseNet/issues/4 . The error occurs when the training fails (e.g., loss divergence). I reply assuming fine-tuning from COCO weights.

(1) Please double-check that the load_from setting of your config is right. Wrong setting for fine-tuning is a common mistake. Could you please paste your log? Lines from load checkpoint from to Start running are important for verification.

(2) Please use settings for more stable training.

lower learning rate
larger batch size
longer warmup

Single-stage detectors are more sensitive to the settings above.

shinya7y commented 4 years ago

The direct cause of the error seems a bug in mmdetection.

Please modify File "/home/detmodel/UniverseNet/mmdet/models/losses/iou_loss.py", line 344, in forward from return (pred * weight).sum() # 0 to return (pred * weight.reshape(pred.shape[0], -1)).sum() # 0

shinya7y commented 4 years ago

Please pull the latest code of master branch to fix the bug.

Even if the error disappear, checking load_from and tuning hyperparameters are preferable. This is because the line 344 is usually unreachable by stable training.

Ericargus commented 4 years ago

2020-09-06 22:36:30,802 - mmdet - INFO - load checkpoint from /home/detmodel/UniverseNet/universenet50_gfl_fp16_4x4_mstrain_480_960_2x_coco_20200729_epoch_24-c9308e66.pth 2020-09-06 22:36:30,889 - mmdet - WARNING - The model and loaded state dict do not match exactly size mismatch for bbox_head.gfl_cls.weight: copying a param with shape torch.Size([80, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([28, 256, 3, 3]). size mismatch for bbox_head.gfl_cls.bias: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([28]).

sorry for reply late. When l use coco pretrained weight the error gone