Runtime Error - Githubissues

Atul997 commented 4 years ago

I am training model on custom data and using config file gfl_dcn_r101_ms2x.py. After 4 epochs I am getting following error -

File "tools/train.py", line 151, in <module> main() File "tools/train.py", line 147, in main meta=meta) File "/usr/local/lib/python3.6/dist-packages/mmdet-1.1.0+32863f2-py3.6-linux-x86_64.egg/mmdet/apis/train.py", line 165, in train_detector runner.run(data_loaders, cfg.workflow, cfg.total_epochs) File "/usr/local/lib/python3.6/dist-packages/mmcv/runner/runner.py", line 359, in run epoch_runner(data_loaders[i], **kwargs) File "/usr/local/lib/python3.6/dist-packages/mmcv/runner/runner.py", line 263, in train self.model, data_batch, train_mode=True, **kwargs) File "/usr/local/lib/python3.6/dist-packages/mmdet-1.1.0+32863f2-py3.6-linux-x86_64.egg/mmdet/apis/train.py", line 75, in batch_processor losses = model(**data) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__ result = self.forward(*input, **kwargs) File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/data_parallel.py", line 153, in forward return self.module(*inputs[0], **kwargs[0]) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__ result = self.forward(*input, **kwargs) File "/usr/local/lib/python3.6/dist-packages/mmdet-1.1.0+32863f2-py3.6-linux-x86_64.egg/mmdet/core/fp16/decorators.py", line 49, in new_func return old_func(*args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/mmdet-1.1.0+32863f2-py3.6-linux-x86_64.egg/mmdet/models/detectors/base.py", line 147, in forward return self.forward_train(img, img_metas, **kwargs) File "/usr/local/lib/python3.6/dist-packages/mmdet-1.1.0+32863f2-py3.6-linux-x86_64.egg/mmdet/models/detectors/single_stage.py", line 71, in forward_train *loss_inputs, gt_bboxes_ignore=gt_bboxes_ignore) File "/usr/local/lib/python3.6/dist-packages/mmdet-1.1.0+32863f2-py3.6-linux-x86_64.egg/mmdet/core/fp16/decorators.py", line 127, in new_func return old_func(*args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/mmdet-1.1.0+32863f2-py3.6-linux-x86_64.egg/mmdet/models/anchor_heads/gfl_head.py", line 254, in loss cfg=cfg) File "/usr/local/lib/python3.6/dist-packages/mmdet-1.1.0+32863f2-py3.6-linux-x86_64.egg/mmdet/core/utils/misc.py", line 24, in multi_apply return tuple(map(list, zip(*map_results))) File "/usr/local/lib/python3.6/dist-packages/mmdet-1.1.0+32863f2-py3.6-linux-x86_64.egg/mmdet/models/anchor_heads/gfl_head.py", line 186, in loss_single avg_factor=1.0) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__ result = self.forward(*input, **kwargs) File "/usr/local/lib/python3.6/dist-packages/mmdet-1.1.0+32863f2-py3.6-linux-x86_64.egg/mmdet/models/losses/iou_loss.py", line 200, in forward return (pred * weight).sum() # 0 RuntimeError: The size of tensor a (4) must match the size of tensor b (529) at non-singleton dimension 1 How to resolve this?

implus commented 4 years ago

It is likely that your classification label is not well assigned on your custom data. Please check the cls label carefully.

Atul997 commented 4 years ago

It is likely that your classification label is not well assigned on your custom data. Please check the cls label carefully.

If it is so, then it should stop after first epoch, why it ran upto 4 epoch?

implus commented 4 years ago

It is likely that your classification label is not well assigned on your custom data. Please check the cls label carefully.

If it is so, then it should stop after first epoch, why it ran upto 4 epoch?

If you are sure that you classification label is exactly right, I suggest to try:

weight_targets += 0.1
weight_targets[weight_targets > 1.0] = 1.0

in line180 of gfl_head.py.

By the way, I wonder if the validation performances for the first 3 epochs looks good. Can you provide the APs for the first 3 epochs?

implus / GFocal

Runtime Error #10