Closed zymale closed 3 years ago
This is a method supported by the author of the paper. Can you tell us the mismatch with the original paper? If possible, can you submit some evidence such as intermediate outputs to indicate the mismatch?
I can't give the specific difference of the code yet. But when I debug, locate the specific problem and give those lines. I use a custom dataset. when I use the code git from https://github.com/dbolya/yolact,It runs normally. There is the log: 2021-09-01 22:08:59,422 - mmdet - INFO - Distributed training: True 2021-09-01 22:08:59,722 - mmdet - INFO - Config: checkpoint_config = dict(interval=1) log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')]) custom_hooks = [dict(type='NumClassCheckHook')] dist_params = dict(backend='nccl') log_level = 'INFO' load_from = None resume_from = None workflow = [('train', 1)] img_size = 1024 model = dict( type='YOLACT', backbone=dict( type='ResNet', depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=-1, norm_cfg=dict(type='BN', requires_grad=True), norm_eval=False, zero_init_residual=False, style='pytorch', init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), neck=dict( type='FPN', in_channels=[256, 512, 1024, 2048], out_channels=256, start_level=1, add_extra_convs='on_input', num_outs=5, upsample_cfg=dict(mode='bilinear')), bbox_head=dict( type='YOLACTHead', num_classes=1, in_channels=256, feat_channels=256, anchor_generator=dict( type='AnchorGenerator', octave_base_scale=3, scales_per_octave=1, base_sizes=[8, 16, 32, 64, 128], ratios=[0.5, 1.0, 2.0], strides=[ 14.840579710144928, 29.257142857142856, 56.888888888888886, 113.77777777777777, 204.8 ], centers=[(7.420289855072464, 7.420289855072464), (14.628571428571428, 14.628571428571428), (28.444444444444443, 28.444444444444443), (56.888888888888886, 56.888888888888886), (102.4, 102.4)]), bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=False, reduction='none', loss_weight=1.0), loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.5), num_head_convs=1, num_protos=32, use_ohem=True), mask_head=dict( type='YOLACTProtonet', in_channels=256, num_protos=32, num_classes=1, max_masks_to_train=100, loss_mask_weight=6.125), segm_head=dict( type='YOLACTSegmHead', num_classes=1, in_channels=256, loss_segm=dict( type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0)), train_cfg=dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.5, neg_iou_thr=0.4, min_pos_iou=0.0, ignore_iof_thr=-1, gt_max_assign_all=False), allowed_border=-1, pos_weight=-1, neg_pos_ratio=3, debug=False), test_cfg=dict( nms_pre=1000, min_bbox_size=0, score_thr=0.05, iou_thr=0.5, top_k=200, max_per_img=100)) dataset_type = 'CocoDataset' data_root = '/media/zymale/F/shichengcheng/coco/' img_norm_cfg = dict( mean=[123.68, 116.78, 103.94], std=[58.4, 57.12, 57.38], to_rgb=True) train_pipeline = [ dict(type='LoadImageFromFile', to_float32=True), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict(type='FilterAnnotations', min_gt_bbox_wh=(4.0, 4.0)), dict( type='PhotoMetricDistortion', brightness_delta=32, contrast_range=(0.5, 1.5), saturation_range=(0.5, 1.5), hue_delta=18), dict( type='Expand', mean=[123.68, 116.78, 103.94], to_rgb=True, ratio_range=(1, 4)), dict( type='MinIoURandomCrop', min_ious=(0.1, 0.3, 0.5, 0.7, 0.9), min_crop_size=0.3), dict(type='Resize', img_scale=(1024, 1024), keep_ratio=False), dict( type='Normalize', mean=[123.68, 116.78, 103.94], std=[58.4, 57.12, 57.38], to_rgb=True), dict(type='RandomFlip', flip_ratio=0.5), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']) ] test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1024, 1024), flip=False, transforms=[ dict(type='Resize', keep_ratio=False), dict( type='Normalize', mean=[123.68, 116.78, 103.94], std=[58.4, 57.12, 57.38], to_rgb=True), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ] data = dict( samples_per_gpu=4, workers_per_gpu=4, train=dict( type='CocoDataset', ann_file='/media/zymale/F/shichengcheng/coco/annotations_train.json', img_prefix='/media/zymale/F/shichengcheng/coco/images/', pipeline=[ dict(type='LoadImageFromFile', to_float32=True), dict(type='LoadAnnotations', with_bbox=True, with_mask=True), dict(type='FilterAnnotations', min_gt_bbox_wh=(4.0, 4.0)), dict( type='PhotoMetricDistortion', brightness_delta=32, contrast_range=(0.5, 1.5), saturation_range=(0.5, 1.5), hue_delta=18), dict( type='Expand', mean=[123.68, 116.78, 103.94], to_rgb=True, ratio_range=(1, 4)), dict( type='MinIoURandomCrop', min_ious=(0.1, 0.3, 0.5, 0.7, 0.9), min_crop_size=0.3), dict(type='Resize', img_scale=(1024, 1024), keep_ratio=False), dict( type='Normalize', mean=[123.68, 116.78, 103.94], std=[58.4, 57.12, 57.38], to_rgb=True), dict(type='RandomFlip', flip_ratio=0.5), dict(type='DefaultFormatBundle'), dict( type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']) ]), val=dict( type='CocoDataset', ann_file='/media/zymale/F/shichengcheng/coco/annotations_val.json', img_prefix='/media/zymale/F/shichengcheng/coco/images/', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1024, 1024), flip=False, transforms=[ dict(type='Resize', keep_ratio=False), dict( type='Normalize', mean=[123.68, 116.78, 103.94], std=[58.4, 57.12, 57.38], to_rgb=True), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ]), test=dict( type='CocoDataset', ann_file='/media/zymale/F/shichengcheng/coco/annotations_val.json', img_prefix='/media/zymale/F/shichengcheng/coco/images/', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1024, 1024), flip=False, transforms=[ dict(type='Resize', keep_ratio=False), dict( type='Normalize', mean=[123.68, 116.78, 103.94], std=[58.4, 57.12, 57.38], to_rgb=True), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ])) optimizer = dict(type='SGD', lr=0.001, momentum=0.9, weight_decay=0.0005) optimizer_config = dict() lr_config = dict( policy='step', warmup='linear', warmup_iters=500, warmup_ratio=0.1, step=[40, 62, 100, 152]) runner = dict(type='EpochBasedRunner', max_epochs=200) cudnn_benchmark = True evaluation = dict(metric=['bbox', 'segm']) work_dir = '/media/zymale/F/mmdetection/tools/work_dirs/work' gpu_ids = range(0, 1)
/media/zymale/F/mmdetection/mmdet/core/anchor/builder.py:16: UserWarning: build_anchor_generator
would be deprecated soon, please use build_prior_generator
'build_anchor_generator
would be deprecated soon, please use '
2021-09-01 22:09:00,050 - mmcv - INFO - load model from: torchvision://resnet50
2021-09-01 22:09:00,050 - mmcv - INFO - Use load_from_torchvision loader
2021-09-01 22:09:00,206 - mmcv - WARNING - The model and loaded state dict do not match exactly
unexpected key in source state_dict: fc.weight, fc.bias
loading annotations into memory...
Done (t=0.02s)
creating index...
index created!
loading annotations into memory...
Done (t=0.01s)
creating index...
index created!
2021-09-01 22:09:02,265 - mmdet - INFO - Start running, host: zymale@zymale-MS-7A94, work_dir: /media/zymale/F/mmdetection/tools/work_dirs/work
2021-09-01 22:09:02,265 - mmdet - INFO - Hooks will be executed in the following order:
before_run:
(VERY_HIGH ) StepLrUpdaterHook
(NORMAL ) CheckpointHook
(NORMAL ) DistEvalHook
(VERY_LOW ) TextLoggerHook
before_train_epoch:
(VERY_HIGH ) StepLrUpdaterHook
(NORMAL ) DistSamplerSeedHook
(NORMAL ) DistEvalHook
(NORMAL ) NumClassCheckHook
(LOW ) IterTimerHook
(VERY_LOW ) TextLoggerHook
before_train_iter:
(VERY_HIGH ) StepLrUpdaterHook
(NORMAL ) DistEvalHook
(LOW ) IterTimerHook
after_train_iter:
(ABOVE_NORMAL) OptimizerHook
(NORMAL ) CheckpointHook
(NORMAL ) DistEvalHook
(LOW ) IterTimerHook
(VERY_LOW ) TextLoggerHook
after_train_epoch:
(NORMAL ) CheckpointHook
(NORMAL ) DistEvalHook
(VERY_LOW ) TextLoggerHook
before_val_epoch:
(NORMAL ) DistSamplerSeedHook
(NORMAL ) NumClassCheckHook
(LOW ) IterTimerHook
(VERY_LOW ) TextLoggerHook
before_val_iter: (LOW ) IterTimerHook
after_val_iter: (LOW ) IterTimerHook
after_val_epoch: (VERY_LOW ) TextLoggerHook
2021-09-01 22:09:02,266 - mmdet - INFO - workflow: [('train', 1)], max: 200 epochs
/home/zymale/anaconda3/envs/torch_cuda10.2/lib/python3.6/site-packages/torch/nn/functional.py:3121: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
"See the documentation of nn.Upsample for details.".format(mode))
/media/zymale/F/mmdetection/mmdet/core/anchor/anchor_generator.py:323: UserWarning: grid_anchors
would be deprecated soon. Please use grid_priors
warnings.warn('grid_anchors
would be deprecated soon. '
/media/zymale/F/mmdetection/mmdet/core/anchor/anchor_generator.py:360: UserWarning: single_level_grid_anchors
would be deprecated soon. Please use single_level_grid_priors
'single_level_grid_anchors
would be deprecated soon. '
Traceback (most recent call last):
File "./train.py", line 190, in
ValueError: Expected input batch_size (65472) to match target batch_size (19248). Ubuntu 20.04 pytorch 1.5.1 and 1.6 problem is same. cuda 10.2
Thank for your bug reporting, I will check it asap
Sorry that I can not reproduce the problem. But you are right, there is an obvious logic error about valid anchors, but in yolact, we resize all images to the same size and there is no padding related operation in train_pipeline
, so all anchors should be valid.
The error you meet maybe not be related to this issue. Would you mind give more details about it, especially the tensor shape of cls_score
, labelsand
label_weights`
yes,you are right.There is a mistake for the strides in config.py.I have chage the image_size to 1024.But the strides I just use default which is just right for image_size 550. the same for centers in config.py.
I'm sorry for the trouble caused to you because of my mistake.
mmdet version 2.15.1 yolact_head.py function loss line176-192 filter out invalid anchors and GT just match the valid anchor;but line196-232 prediction for all anchors .There is an error about dims mismatch for loss calculation.