open-mmlab / mmdetection

OpenMMLab Detection Toolbox and Benchmark
https://mmdetection.readthedocs.io
Apache License 2.0
29.06k stars 9.37k forks source link

Why is there such a big difference between the results of data augmentation and non-data augmentation? #591

Closed lzwhard closed 5 years ago

lzwhard commented 5 years ago

Thank for your excellent work, I encounter with the problem showed in the caption. this is my config file:

model = dict(
    type='FasterRCNN',
    pretrained='modelzoo://resnet50',
    backbone=dict(
        type='ResNet',
        depth=50,
        num_stages=4,              
        out_indices=(0, 1, 2, 3),   
        frozen_stages=1,          
        style='pytorch'),
    neck=dict(
        type='FPN',
        in_channels=[256, 512, 1024, 2048],     # 输入的各个stage的通道数,都是stage的最后一层
        out_channels=256,                       # FPN输出的特征层的通道数
        num_outs=5),                            # FPN输出的特征层的数量
    rpn_head=dict(
        type='RPNHead',
        in_channels=256,                        # RPN网络的输入通道数
        feat_channels=256,                      # RPN网络的输入通道数
        anchor_scales=[8],
        anchor_ratios=[0.5, 1.0, 2.0],
        anchor_strides=[4, 8, 16, 32, 64],
        target_means=[.0, .0, .0, .0],
        target_stds=[1.0, 1.0, 1.0, 1.0],
        use_sigmoid_cls=True),                  # 是否使用sigmoid来进行分类,如果False则使用softmax来分类
    bbox_roi_extractor=dict(
        type='SingleRoIExtractor',              # RoIExtractor类型
        roi_layer=dict(type='RoIAlign', out_size=7, sample_num=2),
        out_channels=256,
        featmap_strides=[4, 8, 16, 32]),
    bbox_head=dict(
        type='SharedFCBBoxHead',
        num_fcs=2,                              # 全连接层数量
        in_channels=256,
        fc_out_channels=1024,
        roi_feat_size=7,
        num_classes=8,  # todo add one more for background
        target_means=[0., 0., 0., 0.],
        target_stds=[0.1, 0.1, 0.2, 0.2],
        reg_class_agnostic=False))              # 是否采用class_agnostic的方式来预测
                                                # class_agnostic表示输出bbox时只考虑其是否为前景,
                                                # 后续分类的时候再根据该bbox在网络中的类别得分来分类,也就是说一个框可以对应多个类别

# model training and testing settings
train_cfg = dict(
    rpn=dict(
        assigner=dict(
            type='MaxIoUAssigner',
            pos_iou_thr=0.7,
            neg_iou_thr=0.3,
            min_pos_iou=0.3,
            ignore_iof_thr=-1),
        sampler=dict(
            type='RandomSampler',
            num=256,
            pos_fraction=0.5,
            neg_pos_ub=-1,
            add_gt_as_proposals=False),
        allowed_border=0,
        pos_weight=-1,
        smoothl1_beta=1 / 9.0,
        debug=False),
    rcnn=dict(
        assigner=dict(
            type='MaxIoUAssigner',
            pos_iou_thr=0.5,
            neg_iou_thr=0.5,
            min_pos_iou=0.5,
            ignore_iof_thr=-1),
        sampler=dict(
            type='RandomSampler',
            num=512,
            pos_fraction=0.25,
            neg_pos_ub=-1,
            add_gt_as_proposals=True),
        pos_weight=-1,
        debug=False))
test_cfg = dict(
    rpn=dict(
        nms_across_levels=False,
        nms_pre=2000,
        nms_post=2000,
        max_num=2000,
        nms_thr=0.7,
        min_bbox_size=0),
    rcnn=dict(
        score_thr=0.05, nms=dict(type='nms', iou_thr=0.5), max_per_img=100)
    # soft-nms is also supported for rcnn testing
    # e.g., nms=dict(type='soft_nms', iou_thr=0.5, min_score=0.05)
)
# dataset settings
# dataset_type = 'DriverDataset'
# data_root = '/home/gpu/datasets/lzw/lzw2/object_det/mmdetection/data/'
dataset_type = 'CustomDataset'
data_root = '/home/liuzhenwei/project/data/'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
data = dict(
    imgs_per_gpu=2,
    workers_per_gpu=0,
    train=dict(
        type=dataset_type,
        ann_file=data_root + 'train/labels.pkl',
        img_prefix=data_root + 'train/images/',
        img_scale=(1333, 800),
        img_norm_cfg=img_norm_cfg,
        size_divisor=32,
        flip_ratio=0.5,
        with_mask=False,
        with_crowd=True,
        with_label=True),
    val=dict(
        type=dataset_type,
        ann_file=data_root + 'test/labels.pkl',
        img_prefix=data_root + 'test/images/',
        img_scale=(1333, 800),
        img_norm_cfg=img_norm_cfg,
        size_divisor=32,
        flip_ratio=0,
        with_mask=False,
        with_crowd=True,
        with_label=True),
    test=dict(
        type=dataset_type,
        ann_file=data_root + 'test/labels.pkl',
        img_prefix=data_root + 'test/images/',
        img_scale=(1333, 800),
        img_norm_cfg=img_norm_cfg,
        size_divisor=32,
        flip_ratio=0,
        with_mask=False,
        with_label=False,
        test_mode=True))
# optimizer
optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
# learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=1.0 / 3,
    step=[8, 11])
checkpoint_config = dict(interval=1)
# yapf:disable
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        # dict(type='TensorboardLoggerHook')
    ])
# yapf:enable
# runtime settings
total_epochs = 1000 #
dist_params = dict(backend='nccl')
log_level = 'INFO'
work_dir = './work_dirs/faster_rcnn_r50_fpn_1x'
load_from = None
resume_from = None
workflow = [('train', 1)]

With data augmentation, the result is very poor:

extra_aug=dict(
            photo_metric_distortion=dict(
                    brightness_delta=32,
                    contrast_range=(0.5, 1.5),
                    saturation_range=(0.5, 1.5),
                    hue_delta=18),
            expand=dict(
                    mean=img_norm_cfg['mean'],
                    to_rgb=img_norm_cfg['to_rgb'],
                    ratio_range=(1, 4)),
            random_crop=dict(
                    min_ious=(0.1, 0.3, 0.5, 0.7, 0.9), min_crop_size=0.3))

The mAP of the training set began to rise slightly and soon dropped to zero.

epoch2: mAP=0.244, epoch5: mAP=0.089, and soon dropped to zero

Without data augmentation, the result is good: The mAP of the training set gradually rises, and when it reaches 1, it is basically stable.

hellock commented 5 years ago

This extra_aug is designed for SSD, some background knowledge is needed if you want to adapt it to Faster R-CNN.

lzwhard commented 5 years ago

This extra_aug is designed for SSD, some background knowledge is needed if you want to adapt it to Faster R-CNN.

Thanks for your reply. I should modify parameters of augments for my dataset, right?

hellock commented 5 years ago

Yes you are right.

lzwhard commented 5 years ago

Thanks again, I will try

gittigxuy commented 5 years ago

@lyuwenyu @hellock ,which method do you think is the most impact the bad result?I think it is random crop,the other 2 method is for color jitter,I think it could help to improve result