open-mmlab / mmdetection

OpenMMLab Detection Toolbox and Benchmark
https://mmdetection.readthedocs.io
Apache License 2.0
29.08k stars 9.38k forks source link

different result with the same training twice #4263

Closed yustaub closed 3 years ago

yustaub commented 3 years ago

Hi, sir. I use the cascade_rcnn_r101_fpn to train my dataset and use the trained detector to do track task. I trained the detector twice, and keep the config the same and didn't change the tracker, the fisrt one get MOTA 71, the second one get MOTA 69, is there much randomness in training? I only find RandomFlip in train_pipeline, does this cause the difference? Very appreciate for your reply!

yustaub commented 3 years ago

I evaluate the two model (both at epoch 12), the same cofig, eval on the same val data, here is the eval result:

first

Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.136 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=1000 ] = 0.256 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=1000 ] = 0.129 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.232 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.127 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.162 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.207 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=300 ] = 0.207 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=1000 ] = 0.207 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.410 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.292 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.198

+------------+-------+----------+-------+----------+-------+ | category | AP | category | AP | category | AP | +------------+-------+----------+-------+----------+-------+ | pedestrian | 0.213 | car | 0.574 | cyclist | 0.120 | | van | 0.264 | truck | 0.023 | person | 0.000 | | tram | 0.000 | misc | 0.013 | dontcare | 0.019 | +------------+-------+----------+-------+----------+-------+

second

Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.143 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=1000 ] = 0.264 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=1000 ] = 0.135 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.267 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.137 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.168 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.215 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=300 ] = 0.215 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=1000 ] = 0.215 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.397 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.307 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.205

+------------+-------+----------+-------+----------+-------+ | category | AP | category | AP | category | AP | +------------+-------+----------+-------+----------+-------+ | pedestrian | 0.218 | car | 0.575 | cyclist | 0.143 | | van | 0.289 | truck | 0.029 | person | 0.000 | | tram | 0.000 | misc | 0.013 | dontcare | 0.017 | +------------+-------+----------+-------+----------+-------+

training config

model = dict( type='CascadeRCNN', pretrained='torchvision://resnet101', backbone=dict( type='ResNet', depth=101, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, norm_cfg=dict(type='BN', requires_grad=True), norm_eval=True, style='pytorch'), neck=dict( type='FPN', in_channels=[256, 512, 1024, 2048], out_channels=256, num_outs=5), rpn_head=dict( type='RPNHead', in_channels=256, feat_channels=256, anchor_generator=dict( type='AnchorGenerator', scales=[8], ratios=[0.5, 1.0, 2.0], strides=[4, 8, 16, 32, 64]), bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[1.0, 1.0, 1.0, 1.0]), loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), loss_bbox=dict( type='SmoothL1Loss', beta=0.1111111111111111, loss_weight=1.0)), roi_head=dict( type='CascadeRoIHead', num_stages=3, stage_loss_weights=[1, 0.5, 0.25], bbox_roi_extractor=dict( type='SingleRoIExtractor', roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0), out_channels=256, featmap_strides=[4, 8, 16, 32]), bbox_head=[ dict( type='Shared2FCBBoxHead', in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=9, bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), reg_class_agnostic=True, loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0)), dict( type='Shared2FCBBoxHead', in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=9, bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.05, 0.05, 0.1, 0.1]), reg_class_agnostic=True, loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0)), dict( type='Shared2FCBBoxHead', in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=9, bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.033, 0.033, 0.067, 0.067]), reg_class_agnostic=True, loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0)) ])) train_cfg = dict( rpn=dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.7, neg_iou_thr=0.3, min_pos_iou=0.3, match_low_quality=True, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), allowed_border=0, pos_weight=-1, debug=False), rpn_proposal=dict( nms_across_levels=False, nms_pre=2000, nms_post=2000, max_num=2000, nms_thr=0.7, min_bbox_size=0), rcnn=[ dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0.5, match_low_quality=False, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), pos_weight=-1, debug=False), dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.6, neg_iou_thr=0.6, min_pos_iou=0.6, match_low_quality=False, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), pos_weight=-1, debug=False), dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.7, neg_iou_thr=0.7, min_pos_iou=0.7, match_low_quality=False, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), pos_weight=-1, debug=False) ]) test_cfg = dict( rpn=dict( nms_across_levels=False, nms_pre=1000, nms_post=1000, max_num=1000, nms_thr=0.7, min_bbox_size=0), rcnn=dict( score_thr=0.05, nms=dict(type='nms', iou_threshold=0.5), max_per_img=100)) dataset_type = 'CocoDataset' data_root = 'data/coco/' img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.5), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']) ] test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1333, 800), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ] data = dict( samples_per_gpu=2, workers_per_gpu=2, train=dict( type='CocoDataset', ann_file='data/coco/annotations/instances_train2017.json', img_prefix='data/coco/train2017/', pipeline=[ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.5), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']) ]), val=dict( type='CocoDataset', ann_file='data/coco/annotations/instances_val2017.json', img_prefix='data/coco/val2017/', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1333, 800), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ]), test=dict( type='CocoDataset', ann_file='data/coco/annotations/instances_val2017.json', img_prefix='data/coco/val2017/', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1333, 800), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ])) evaluation = dict(interval=1, metric='bbox') optimizer = dict(type='SGD', lr=0.0025, momentum=0.9, weight_decay=0.0001) optimizer_config = dict(grad_clip=None) lr_config = dict( policy='step', warmup='linear', warmup_iters=500, warmup_ratio=0.001, step=[8, 11]) total_epochs = 12 checkpoint_config = dict(interval=1) log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')]) dist_params = dict(backend='nccl') log_level = 'INFO' load_from = 'checkpoints/cascade_rcnn_r101_fpn_1x_coco_20200317-0b6a2fbf.pth' resume_from = None workflow = [('train', 1)] work_dir = './work_dirs/cascade_rcnn_r101_fpn_1x_coco' gpu_ids = range(0, 1)

any one can help to explain?

borgarpa commented 3 years ago

Yes, RandomFlip is a source of variation, since flipped images will vary from one model to the other. As far as I know, Neural Networks have a stochastic component to them. That is, you can't perfectly predict what the outcome of a training session will be. Think of a Neural Net a bit like chaotic systems where it's evolution is not determined. Besides, some optimization and regularization techniques, like SGD or Dropout do introduce randomness as well.

BUT! I think you can set a random seed so that these random components affect less on model's result.

Anyway, you may want to take a look at this blog.

Hope it helps!

yustaub commented 3 years ago

thanks for your reply!!Appreciate! @borgarpa