shinya7y / UniverseNet

USB: Universal-Scale Object Detection Benchmark (BMVC 2022)
Apache License 2.0
422 stars 54 forks source link

RuntimeError: The size of tensor a (4) must match the size of tensor b (7) at non-singleton dimension 1 #7

Closed Ericargus closed 4 years ago

Ericargus commented 4 years ago

we use my own dataset, the dataset has 10 classes, and i change num_classes=10 in the model config file. when i start train ,after few epoches, i met this error. File "/home/detmodel/UniverseNet/mmdet/models/losses/iou_loss.py", line 344, in forward return (pred * weight).sum() # 0 RuntimeError: The size of tensor a (4) must match the size of tensor b (7) at non-singleton dimension 1. can you give me some suggestions. Thanks a alot. ps: my dataset works well with cascade rcnn.

shinya7y commented 4 years ago

This issue seems related to https://github.com/shinya7y/UniverseNet/issues/4 . The error occurs when the training fails (e.g., loss divergence). I reply assuming fine-tuning from COCO weights.

(1) Please double-check that the load_from setting of your config is right. Wrong setting for fine-tuning is a common mistake. Could you please paste your log? Lines from load checkpoint from to Start running are important for verification.

(2) Please use settings for more stable training.

Single-stage detectors are more sensitive to the settings above.

shinya7y commented 4 years ago

The direct cause of the error seems a bug in mmdetection.

Please modify File "/home/detmodel/UniverseNet/mmdet/models/losses/iou_loss.py", line 344, in forward from return (pred * weight).sum() # 0 to return (pred * weight.reshape(pred.shape[0], -1)).sum() # 0

shinya7y commented 4 years ago

Please pull the latest code of master branch to fix the bug.

Even if the error disappear, checking load_from and tuning hyperparameters are preferable. This is because the line 344 is usually unreachable by stable training.

Ericargus commented 4 years ago

2020-09-06 22:36:30,802 - mmdet - INFO - load checkpoint from /home/detmodel/UniverseNet/universenet50_gfl_fp16_4x4_mstrain_480_960_2x_coco_20200729_epoch_24-c9308e66.pth 2020-09-06 22:36:30,889 - mmdet - WARNING - The model and loaded state dict do not match exactly size mismatch for bbox_head.gfl_cls.weight: copying a param with shape torch.Size([80, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([28, 256, 3, 3]). size mismatch for bbox_head.gfl_cls.bias: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([28]).

sorry for reply late. When l use coco pretrained weight the error gone