Closed Ericargus closed 4 years ago
This issue seems related to https://github.com/shinya7y/UniverseNet/issues/4 . The error occurs when the training fails (e.g., loss divergence). I reply assuming fine-tuning from COCO weights.
(1) Please double-check that the load_from
setting of your config is right.
Wrong setting for fine-tuning is a common mistake.
Could you please paste your log?
Lines from load checkpoint from
to Start running
are important for verification.
(2) Please use settings for more stable training.
Single-stage detectors are more sensitive to the settings above.
The direct cause of the error seems a bug in mmdetection.
Please modify File "/home/detmodel/UniverseNet/mmdet/models/losses/iou_loss.py", line 344, in forward
from
return (pred * weight).sum() # 0
to
return (pred * weight.reshape(pred.shape[0], -1)).sum() # 0
Please pull the latest code of master branch to fix the bug.
Even if the error disappear, checking load_from
and tuning hyperparameters are preferable.
This is because the line 344
is usually unreachable by stable training.
2020-09-06 22:36:30,802 - mmdet - INFO - load checkpoint from /home/detmodel/UniverseNet/universenet50_gfl_fp16_4x4_mstrain_480_960_2x_coco_20200729_epoch_24-c9308e66.pth 2020-09-06 22:36:30,889 - mmdet - WARNING - The model and loaded state dict do not match exactly size mismatch for bbox_head.gfl_cls.weight: copying a param with shape torch.Size([80, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([28, 256, 3, 3]). size mismatch for bbox_head.gfl_cls.bias: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([28]).
sorry for reply late. When l use coco pretrained weight the error gone
we use my own dataset, the dataset has 10 classes, and i change num_classes=10 in the model config file. when i start train ,after few epoches, i met this error. File "/home/detmodel/UniverseNet/mmdet/models/losses/iou_loss.py", line 344, in forward return (pred * weight).sum() # 0 RuntimeError: The size of tensor a (4) must match the size of tensor b (7) at non-singleton dimension 1. can you give me some suggestions. Thanks a alot. ps: my dataset works well with cascade rcnn.