yolact error mismatch dims for prediction and gt

zymale commented 3 years ago

mmdet version 2.15.1 yolact_head.py function loss line176-192 filter out invalid anchors and GT just match the valid anchor；but line196-232 prediction for all anchors .There is an error about dims mismatch for loss calculation.

AronLin commented 3 years ago

This is a method supported by the author of the paper. Can you tell us the mismatch with the original paper? If possible, can you submit some evidence such as intermediate outputs to indicate the mismatch?

zymale commented 3 years ago

I can't give the specific difference of the code yet. But when I debug, locate the specific problem and give those lines. I use a custom dataset. when I use the code git from https://github.com/dbolya/yolact，It runs normally. There is the log: 2021-09-01 22:08:59,422 - mmdet - INFO - Distributed training: True 2021-09-01 22:08:59,722 - mmdet - INFO - Config: checkpoint_config = dict(interval=1) log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')]) custom_hooks = [dict(type='NumClassCheckHook')] dist_params = dict(backend='nccl') log_level = 'INFO' load_from = None resume_from = None workflow = [('train', 1)] img_size = 1024 model = dict( type='YOLACT', backbone=dict( type='ResNet', depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=-1, norm_cfg=dict(type='BN', requires_grad=True), norm_eval=False, zero_init_residual=False, style='pytorch', init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), neck=dict( type='FPN', in_channels=[256, 512, 1024, 2048], out_channels=256, start_level=1, add_extra_convs='on_input', num_outs=5, upsample_cfg=dict(mode='bilinear')), bbox_head=dict( type='YOLACTHead', num_classes=1, in_channels=256, feat_channels=256, anchor_generator=dict( type='AnchorGenerator', octave_base_scale=3, scales_per_octave=1, base_sizes=[8, 16, 32, 64, 128], ratios=[0.5, 1.0, 2.0], strides=[ 14.840579710144928, 29.257142857142856, 56.888888888888886, 113.77777777777777, 204.8 ], centers=[(7.420289855072464, 7.420289855072464), (14.628571428571428, 14.628571428571428), (28.444444444444443, 28.444444444444443), (56.888888888888886, 56.888888888888886), (102.4, 102.4)]), bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=False, reduction='none', loss_weight=1.0), loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.5), num_head_convs=1, num_protos=32, use_ohem=True), mask_head=dict( type='YOLACTProtonet', in_channels=256, num_protos=32, num_classes=1, max_masks_to_train=100, loss_mask_weight=6.125), segm_head=dict( type='YOLACTSegmHead', num_classes=1, in_channels=256, loss_segm=dict( type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0)), train_cfg=dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.5, neg_iou_thr=0.4, min_pos_iou=0.0, ignore_iof_thr=-1, gt_max_assign_all=False), allowed_border=-1, pos_weight=-1, neg_pos_ratio=3, debug=False), test_cfg=dict( nms_pre=1000, min_bbox_size=0, score_thr=0.05, iou_thr=0.5, top_k=200, max_per_img=100)) dataset_type = 'CocoDataset' data_root = '/media/zymale/F/shichengcheng/coco/' img_norm_cfg = dict( mean=[123.68, 116.78, 103.94], std=[58.4, 57.12, 57.38], to_rgb=True) train_pipeline = [ dict(type='LoadImageFromFile', to_float32=True), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict(type='FilterAnnotations', min_gt_bbox_wh=(4.0, 4.0)), dict( type='PhotoMetricDistortion', brightness_delta=32, contrast_range=(0.5, 1.5), saturation_range=(0.5, 1.5), hue_delta=18), dict( type='Expand', mean=[123.68, 116.78, 103.94], to_rgb=True, ratio_range=(1, 4)), dict( type='MinIoURandomCrop', min_ious=(0.1, 0.3, 0.5, 0.7, 0.9), min_crop_size=0.3), dict(type='Resize', img_scale=(1024, 1024), keep_ratio=False), dict( type='Normalize', mean=[123.68, 116.78, 103.94], std=[58.4, 57.12, 57.38], to_rgb=True), dict(type='RandomFlip', flip_ratio=0.5), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']) ] test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1024, 1024), flip=False, transforms=[ dict(type='Resize', keep_ratio=False), dict( type='Normalize', mean=[123.68, 116.78, 103.94], std=[58.4, 57.12, 57.38], to_rgb=True), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ] data = dict( samples_per_gpu=4, workers_per_gpu=4, train=dict( type='CocoDataset', ann_file='/media/zymale/F/shichengcheng/coco/annotations_train.json', img_prefix='/media/zymale/F/shichengcheng/coco/images/', pipeline=[ dict(type='LoadImageFromFile', to_float32=True), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict(type='FilterAnnotations', min_gt_bbox_wh=(4.0, 4.0)), dict( type='PhotoMetricDistortion', brightness_delta=32, contrast_range=(0.5, 1.5), saturation_range=(0.5, 1.5), hue_delta=18), dict( type='Expand', mean=[123.68, 116.78, 103.94], to_rgb=True, ratio_range=(1, 4)), dict( type='MinIoURandomCrop', min_ious=(0.1, 0.3, 0.5, 0.7, 0.9), min_crop_size=0.3), dict(type='Resize', img_scale=(1024, 1024), keep_ratio=False), dict( type='Normalize', mean=[123.68, 116.78, 103.94], std=[58.4, 57.12, 57.38], to_rgb=True), dict(type='RandomFlip', flip_ratio=0.5), dict(type='DefaultFormatBundle'), dict( type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']) ]), val=dict( type='CocoDataset', ann_file='/media/zymale/F/shichengcheng/coco/annotations_val.json', img_prefix='/media/zymale/F/shichengcheng/coco/images/', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1024, 1024), flip=False, transforms=[ dict(type='Resize', keep_ratio=False), dict( type='Normalize', mean=[123.68, 116.78, 103.94], std=[58.4, 57.12, 57.38], to_rgb=True), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ]), test=dict( type='CocoDataset', ann_file='/media/zymale/F/shichengcheng/coco/annotations_val.json', img_prefix='/media/zymale/F/shichengcheng/coco/images/', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1024, 1024), flip=False, transforms=[ dict(type='Resize', keep_ratio=False), dict( type='Normalize', mean=[123.68, 116.78, 103.94], std=[58.4, 57.12, 57.38], to_rgb=True), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ])) optimizer = dict(type='SGD', lr=0.001, momentum=0.9, weight_decay=0.0005) optimizer_config = dict() lr_config = dict( policy='step', warmup='linear', warmup_iters=500, warmup_ratio=0.1, step=[40, 62, 100, 152]) runner = dict(type='EpochBasedRunner', max_epochs=200) cudnn_benchmark = True evaluation = dict(metric=['bbox', 'segm']) work_dir = '/media/zymale/F/mmdetection/tools/work_dirs/work' gpu_ids = range(0, 1)

/media/zymale/F/mmdetection/mmdet/core/anchor/builder.py:16: UserWarning: build_anchor_generator would be deprecated soon, please use build_prior_generator 'build_anchor_generator would be deprecated soon, please use ' 2021-09-01 22:09:00,050 - mmcv - INFO - load model from: torchvision://resnet50 2021-09-01 22:09:00,050 - mmcv - INFO - Use load_from_torchvision loader 2021-09-01 22:09:00,206 - mmcv - WARNING - The model and loaded state dict do not match exactly

unexpected key in source state_dict: fc.weight, fc.bias

loading annotations into memory... Done (t=0.02s) creating index... index created! loading annotations into memory... Done (t=0.01s) creating index... index created! 2021-09-01 22:09:02,265 - mmdet - INFO - Start running, host: zymale@zymale-MS-7A94, work_dir: /media/zymale/F/mmdetection/tools/work_dirs/work 2021-09-01 22:09:02,265 - mmdet - INFO - Hooks will be executed in the following order: before_run: (VERY_HIGH ) StepLrUpdaterHook
(NORMAL ) CheckpointHook
(NORMAL ) DistEvalHook
(VERY_LOW ) TextLoggerHook

before_train_epoch: (VERY_HIGH ) StepLrUpdaterHook
(NORMAL ) DistSamplerSeedHook
(NORMAL ) DistEvalHook
(NORMAL ) NumClassCheckHook
(LOW ) IterTimerHook
(VERY_LOW ) TextLoggerHook

before_train_iter: (VERY_HIGH ) StepLrUpdaterHook
(NORMAL ) DistEvalHook
(LOW ) IterTimerHook

after_train_iter: (ABOVE_NORMAL) OptimizerHook
(NORMAL ) CheckpointHook
(NORMAL ) DistEvalHook
(LOW ) IterTimerHook
(VERY_LOW ) TextLoggerHook

after_train_epoch: (NORMAL ) CheckpointHook
(NORMAL ) DistEvalHook
(VERY_LOW ) TextLoggerHook

before_val_epoch: (NORMAL ) DistSamplerSeedHook
(NORMAL ) NumClassCheckHook
(LOW ) IterTimerHook
(VERY_LOW ) TextLoggerHook

before_val_iter: (LOW ) IterTimerHook

after_val_iter: (LOW ) IterTimerHook

after_val_epoch: (VERY_LOW ) TextLoggerHook

2021-09-01 22:09:02,266 - mmdet - INFO - workflow: [('train', 1)], max: 200 epochs /home/zymale/anaconda3/envs/torch_cuda10.2/lib/python3.6/site-packages/torch/nn/functional.py:3121: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. "See the documentation of nn.Upsample for details.".format(mode)) /media/zymale/F/mmdetection/mmdet/core/anchor/anchor_generator.py:323: UserWarning: grid_anchors would be deprecated soon. Please use grid_priors warnings.warn('grid_anchors would be deprecated soon. ' /media/zymale/F/mmdetection/mmdet/core/anchor/anchor_generator.py:360: UserWarning: single_level_grid_anchors would be deprecated soon. Please use single_level_grid_priors 'single_level_grid_anchors would be deprecated soon. ' Traceback (most recent call last): File "./train.py", line 190, in main() File "./train.py", line 186, in main meta=meta) File "/media/zymale/F/mmdetection/mmdet/apis/train.py", line 170, in train_detector runner.run(data_loaders, cfg.workflow) File "/home/zymale/anaconda3/envs/torch_cuda10.2/lib/python3.6/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run epoch_runner(data_loaders[i], kwargs) File "/home/zymale/anaconda3/envs/torch_cuda10.2/lib/python3.6/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train self.run_iter(data_batch, train_mode=True, kwargs) File "/home/zymale/anaconda3/envs/torch_cuda10.2/lib/python3.6/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter kwargs) File "/home/zymale/anaconda3/envs/torch_cuda10.2/lib/python3.6/site-packages/mmcv/parallel/distributed.py", line 53, in train_step output = self.module.train_step(inputs[0], kwargs[0]) File "/media/zymale/F/mmdetection/mmdet/models/detectors/base.py", line 237, in train_step losses = self(data) File "/home/zymale/anaconda3/envs/torch_cuda10.2/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(input, kwargs) File "/home/zymale/anaconda3/envs/torch_cuda10.2/lib/python3.6/site-packages/mmcv/runner/fp16_utils.py", line 98, in new_func return old_func(args, kwargs) File "/media/zymale/F/mmdetection/mmdet/models/detectors/base.py", line 171, in forward return self.forward_train(img, img_metas, kwargs) File "/media/zymale/F/mmdetection/mmdet/models/detectors/yolact.py", line 74, in forward_train bbox_head_loss_inputs, gt_bboxes_ignore=gt_bboxes_ignore) File "/home/zymale/anaconda3/envs/torch_cuda10.2/lib/python3.6/site-packages/mmcv/runner/fp16_utils.py", line 186, in new_func return old_func(*args, kwargs) File "/media/zymale/F/mmdetection/mmdet/models/dense_heads/yolact_head.py", line 236, in loss num_total_samples=num_total_pos) File "/media/zymale/F/mmdetection/mmdet/core/utils/misc.py", line 29, in multi_apply return tuple(map(list, zip(map_results))) File "/media/zymale/F/mmdetection/mmdet/models/dense_heads/yolact_head.py", line 269, in loss_single_OHEM loss_cls_all = self.loss_cls(cls_score, labels, label_weights) File "/home/zymale/anaconda3/envs/torch_cuda10.2/lib/python3.6/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(input, kwargs) File "/media/zymale/F/mmdetection/mmdet/models/losses/cross_entropy_loss.py", line 249, in forward **kwargs) File "/media/zymale/F/mmdetection/mmdet/models/losses/cross_entropy_loss.py", line 41, in cross_entropy ignore_index=ignore_index) File "/home/zymale/anaconda3/envs/torch_cuda10.2/lib/python3.6/site-packages/torch/nn/functional.py", line 2422, in cross_entropy return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction) File "/home/zymale/anaconda3/envs/torch_cuda10.2/lib/python3.6/site-packages/torch/nn/functional.py", line 2216, in nll_loss .format(input.size(0), target.size(0))) ValueError: Expected input batch_size (65472) to match target batch_size (19248). Traceback (most recent call last): File "/home/zymale/anaconda3/envs/torch_cuda10.2/lib/python3.6/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/home/zymale/anaconda3/envs/torch_cuda10.2/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/zymale/anaconda3/envs/torch_cuda10.2/lib/python3.6/site-packages/torch/distributed/launch.py", line 261, in main() File "/home/zymale/anaconda3/envs/torch_cuda10.2/lib/python3.6/site-packages/torch/distributed/launch.py", line 257, in main cmd=cmd) subprocess.CalledProcessError: Command '['/home/zymale/anaconda3/envs/torch_cuda10.2/bin/python', '-u', './train.py', '--local_rank=0', '../configs/yolact/yolact_r50_1x8_coco.py', '--launcher', 'pytorch']' returned non-zero exit status 1.

ValueError: Expected input batch_size (65472) to match target batch_size (19248). Ubuntu 20.04 pytorch 1.5.1 and 1.6 problem is same. cuda 10.2

jshilong commented 3 years ago

Thank for your bug reporting, I will check it asap

jshilong commented 3 years ago

Sorry that I can not reproduce the problem. But you are right, there is an obvious logic error about valid anchors, but in yolact, we resize all images to the same size and there is no padding related operation in train_pipeline, so all anchors should be valid. The error you meet maybe not be related to this issue. Would you mind give more details about it, especially the tensor shape of cls_score, labelsandlabel_weights`

zymale commented 3 years ago

yes,you are right.There is a mistake for the strides in config.py.I have chage the image_size to 1024.But the strides I just use default which is just right for image_size 550. the same for centers in config.py.

zymale commented 3 years ago

I'm sorry for the trouble caused to you because of my mistake.

open-mmlab / mmdetection

yolact error mismatch dims for prediction and gt #5994