Closed qianwangn closed 4 years ago
That difference should come from multi scale training. I'll train a new model to approve.
use multi scale training from [640,800] is still 0.5 point lower than https://github.com/sfzhang15/ATSS anthor weired question: when useing [400,800] ms train, the performance is the same as use 800 size train.
Could you post more information such as config files you used?
use single gpu training,architecture atss from official, backbone hrnetw18, image scale[1102, 920], lr 0.0025, batchsize 2, workers 2. it is very hard to converge and loss is easy to get exploding gradientsin.here is my config file.
`# model settings model = dict( type='ATSS', pretrained='backbone/hrnetv2_w18.pth', backbone=dict( type='HRNet', extra=dict( stage1=dict( num_modules=1, num_branches=1, block='BOTTLENECK', num_blocks=(4, ), num_channels=(64, )), stage2=dict( num_modules=1, num_branches=2, block='BASIC', num_blocks=(4, 4), num_channels=(18, 36)), stage3=dict( num_modules=4, num_branches=3, block='BASIC', num_blocks=(4, 4, 4), num_channels=(18, 36, 72)), stage4=dict( num_modules=3, num_branches=4, block='BASIC', num_blocks=(4, 4, 4, 4), num_channels=(18, 36, 72, 144)))), neck=dict(type='HRFPN', in_channels=[18, 36, 72, 144], out_channels=256), bbox_head=dict( type='ATSSHead', num_classes=4, in_channels=256, stacked_convs=4, feat_channels=256, octave_base_scale=8, scales_per_octave=1, anchor_ratios=[1.0], anchor_strides=[8, 16, 32, 64, 128], target_means=[.0, .0, .0, .0], target_stds=[0.1, 0.1, 0.2, 0.2], loss_cls=dict( type='FocalLoss', use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), loss_bbox=dict(type='GIoULoss', loss_weight=2.0), loss_centerness=dict( type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0)))
train_cfg = dict( assigner=dict(type='ATSSAssigner', topk=9), allowed_border=-1, pos_weight=-1, debug=False) test_cfg = dict( nms_pre=1000, min_bbox_size=0, score_thr=0.05, nms=dict(type='nms', iou_thr=0.6), max_per_img=100)
dataset_type = 'CocoDataset' data_root = 'data/xxx/' img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict(type='Resize', img_scale=(1120, 920), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.5), dict(type='Normalize', img_norm_cfg), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), ] test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1120, 920), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict(type='Normalize', img_norm_cfg), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']), ]) ] data = dict( imgs_per_gpu=2, workers_per_gpu=2, train=dict( type=dataset_type, ann_file=data_root + 'annotations/train.json', img_prefix=data_root + 'images/train', pipeline=train_pipeline), val=dict( type=dataset_type, ann_file=data_root + 'annotations/val.json', img_prefix=data_root + 'images/val', pipeline=test_pipeline), test=dict( type=dataset_type, ann_file=data_root + 'annotations/val.json', img_prefix=data_root + 'images/val', pipeline=test_pipeline)) evaluation = dict(interval=10, metric=['bbox'])
optimizer = dict(type='SGD', lr=0.0025, momentum=0.9, weight_decay=0.0001) optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
lr_config = dict( policy='step', warmup='linear', warmup_iters=500, warmup_ratio=1.0 / 3, step=[8, 11]) checkpoint_config = dict(interval=1)
log_config = dict( interval=5, hooks=[ dict(type='TextLoggerHook'),
])
total_epochs = 100 dist_params = dict(backend='nccl') log_level = 'INFO' work_dir = './work_dirs/xx/atss_hrw18_fpn_1x' load_from = None resume_from = None workflow = [('train', 10)] `
Seems that you modified lots of hyper-parameters.
@hellock thanks for your replay. I change to multi scale training and 24 epoch, got higher performance than the paper claim. seems multi scale training need more epochs.
@johnlanbor can you share how to use resnet50-dcn and X101_32x8d_dcn config.py
R50 and R50_dcn have the same performance as https://github.com/sfzhang15/ATSS
But R101, R101_dcn, X101_32x8d_dcn,X101_64x4d_dcn is 1.0 lower than the paper claim.