open-mmlab / mmdetection

OpenMMLab Detection Toolbox and Benchmark
https://mmdetection.readthedocs.io
Apache License 2.0
29.09k stars 9.38k forks source link

ATSS performance #2450

Closed qianwangn closed 4 years ago

qianwangn commented 4 years ago

R50 and R50_dcn have the same performance as https://github.com/sfzhang15/ATSS

But R101, R101_dcn, X101_32x8d_dcn,X101_64x4d_dcn is 1.0 lower than the paper claim.

qianwangn commented 4 years ago

That difference should come from multi scale training. I'll train a new model to approve.

qianwangn commented 4 years ago

use multi scale training from [640,800] is still 0.5 point lower than https://github.com/sfzhang15/ATSS anthor weired question: when useing [400,800] ms train, the performance is the same as use 800 size train.

hellock commented 4 years ago

Could you post more information such as config files you used?

PancakeAwesome commented 4 years ago

use single gpu training,architecture atss from official, backbone hrnetw18, image scale[1102, 920], lr 0.0025, batchsize 2, workers 2. it is very hard to converge and loss is easy to get exploding gradientsin.here is my config file.

`# model settings model = dict( type='ATSS', pretrained='backbone/hrnetv2_w18.pth', backbone=dict( type='HRNet', extra=dict( stage1=dict( num_modules=1, num_branches=1, block='BOTTLENECK', num_blocks=(4, ), num_channels=(64, )), stage2=dict( num_modules=1, num_branches=2, block='BASIC', num_blocks=(4, 4), num_channels=(18, 36)), stage3=dict( num_modules=4, num_branches=3, block='BASIC', num_blocks=(4, 4, 4), num_channels=(18, 36, 72)), stage4=dict( num_modules=3, num_branches=4, block='BASIC', num_blocks=(4, 4, 4, 4), num_channels=(18, 36, 72, 144)))), neck=dict(type='HRFPN', in_channels=[18, 36, 72, 144], out_channels=256), bbox_head=dict( type='ATSSHead', num_classes=4, in_channels=256, stacked_convs=4, feat_channels=256, octave_base_scale=8, scales_per_octave=1, anchor_ratios=[1.0], anchor_strides=[8, 16, 32, 64, 128], target_means=[.0, .0, .0, .0], target_stds=[0.1, 0.1, 0.2, 0.2], loss_cls=dict( type='FocalLoss', use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), loss_bbox=dict(type='GIoULoss', loss_weight=2.0), loss_centerness=dict( type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0)))

training and testing settings

train_cfg = dict( assigner=dict(type='ATSSAssigner', topk=9), allowed_border=-1, pos_weight=-1, debug=False) test_cfg = dict( nms_pre=1000, min_bbox_size=0, score_thr=0.05, nms=dict(type='nms', iou_thr=0.6), max_per_img=100)

dataset settings

dataset_type = 'CocoDataset' data_root = 'data/xxx/' img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict(type='Resize', img_scale=(1120, 920), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.5), dict(type='Normalize', img_norm_cfg), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), ] test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1120, 920), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict(type='Normalize', img_norm_cfg), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']), ]) ] data = dict( imgs_per_gpu=2, workers_per_gpu=2, train=dict( type=dataset_type, ann_file=data_root + 'annotations/train.json', img_prefix=data_root + 'images/train', pipeline=train_pipeline), val=dict( type=dataset_type, ann_file=data_root + 'annotations/val.json', img_prefix=data_root + 'images/val', pipeline=test_pipeline), test=dict( type=dataset_type, ann_file=data_root + 'annotations/val.json', img_prefix=data_root + 'images/val', pipeline=test_pipeline)) evaluation = dict(interval=10, metric=['bbox'])

optimizer

optimizer = dict(type='SGD', lr=0.0025, momentum=0.9, weight_decay=0.0001) optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))

learning policy

lr_config = dict( policy='step', warmup='linear', warmup_iters=500, warmup_ratio=1.0 / 3, step=[8, 11]) checkpoint_config = dict(interval=1)

yapf:disable

log_config = dict( interval=5, hooks=[ dict(type='TextLoggerHook'),

dict(type='TensorboardLoggerHook')

])

yapf:enable

runtime settings

total_epochs = 100 dist_params = dict(backend='nccl') log_level = 'INFO' work_dir = './work_dirs/xx/atss_hrw18_fpn_1x' load_from = None resume_from = None workflow = [('train', 10)] `

hellock commented 4 years ago

Seems that you modified lots of hyper-parameters.

qianwangn commented 4 years ago

@hellock thanks for your replay. I change to multi scale training and 24 epoch, got higher performance than the paper claim. seems multi scale training need more epochs.

lji72 commented 4 years ago

@johnlanbor can you share how to use resnet50-dcn and X101_32x8d_dcn config.py