hyz-xmaster / VarifocalNet

VarifocalNet: An IoU-aware Dense Object Detector
Apache License 2.0
351 stars 52 forks source link

Some question about the inference #16

Open Yangr116 opened 3 years ago

Yangr116 commented 3 years ago

Thanks for your nice work~ but I met some questions when I used this model in the customer dataset.

I trained a nice model that achieved satisfying mAP in the validation dataset and the first test dataset (A). But in the second test dataset(B), the trained VfNet performed poorly. And I found that test results were different via many tests. for example, the output json file of the first test was 15MB, the output json file of the second test was 20MB, and so on. I still don't know what caused that. I guessed that may be the multi-scale test?

That's my config:

model = dict(
    type='VFNet',
    pretrained='open-mmlab://res2net101_v1d_26w_4s',
    backbone=dict(
        type='Res2Net',
        depth=101,
        num_stages=4,
        out_indices=(0, 1, 2, 3),
        frozen_stages=1,
        norm_cfg=dict(type='BN', requires_grad=True),
        norm_eval=True,
        style='pytorch',
        dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False),
        stage_with_dcn=(False, True, True, True),
        plugins=[dict(cfg=dict(type='ContextBlock', ratio=1. / 4),
                      stages=(False, True, True, True),
                      position='after_conv3')]
    ),
    neck=dict(
        type='FPN',
        in_channels=[256, 512, 1024, 2048],
        out_channels=256,
        start_level=1,
        add_extra_convs=True,
        extra_convs_on_inputs=False,  # use P5
        num_outs=5,
        relu_before_extra_convs=True),
    bbox_head=dict(
        type='VFNetHead',
        num_classes=num_classes,
        in_channels=256,
        stacked_convs=3,
        feat_channels=256,
        strides=[8, 16, 32, 64, 128],
        center_sampling=False,
        dcn_on_last_conv=True,
        use_atss=True,
        use_vfl=True,
        loss_cls=dict(
            type='VarifocalLoss',
            use_sigmoid=True,
            alpha=0.75,
            gamma=2.0,
            iou_weighted=True,
            loss_weight=1.0),
        loss_bbox=dict(type='GIoULoss', loss_weight=1.5),
        loss_bbox_refine=dict(type='GIoULoss', loss_weight=2.0)),
    # training and testing settings
    train_cfg=dict(
        assigner=dict(type='ATSSAssigner', topk=9),
        allowed_border=-1,
        pos_weight=-1,
        debug=False),
    test_cfg=dict(
        nms_pre=1000,
        min_bbox_size=0,
        score_thr=0.05,
        nms=dict(type='soft_nms', iou_threshold=0.5),
        max_per_img=150))  # 150

img_norm_cfg = dict(mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)

albu_train_transforms = [
    dict(
        type='OneOf',
        transforms=[
            dict(type='IAAAdditiveGaussianNoise', p=0.5),
            dict(type='CLAHE'),
            dict(type='IAASharpen'),
            dict(type='IAAEmboss'),
            dict(type='RandomBrightnessContrast')], p=0.5),
]

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(type='Mosaic', prob=0.5, img_dir='data/train/image',
         json_path='data/annotation/train2.json'),
    dict(type='Resize', img_scale=[(4096, 600), (4096, 1000)], multiscale_mode='range', keep_ratio=True),
    dict(type='RandomFlip', flip_ratio=0.5),

    dict(type='Albu',
         transforms=albu_train_transforms,
         bbox_params=dict(type='BboxParams',
                          format='pascal_voc',
                          label_fields=['gt_labels'],
                          min_visibility=0.0,
                          filter_lost_elements=True),
         keymap={'img': 'image', 'gt_bboxes': 'bboxes'},
         update_pad_shape=False,
         skip_img_without_anno=True),

    dict(type='Normalize', **img_norm_cfg),
    dict(type='GridMask', use_h=True, use_w=True, rotate=1, offset=False, ratio=0.5, mode=1, prob=0.8),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=[(4096, 600), (4096, 800), (4096, 1000)],
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Pad', size_divisor=32),
            dict(type='DefaultFormatBundle'),
            dict(type='Collect', keys=['img']),
        ])
]

data = dict(
    samples_per_gpu=2,
    workers_per_gpu=2,
    train=dict(
        type=dataset_type,
        classes=classes,
        ann_file=train_ann_file,
        img_prefix=image_dir,
        pipeline=train_pipeline),
    val=dict(
        type=dataset_type,
        classes=classes,
        ann_file=val_ann_file,
        img_prefix=image_dir,
        pipeline=test_pipeline),
    test=dict(
        type=dataset_type,
        classes=classes,
        ann_file=test_ann_file,
        img_prefix=test_image_dir,
        pipeline=test_pipeline))

optimizer = dict(lr=0.01, paramwise_cfg=dict(bias_lr_mult=2., bias_decay_mult=0.))

optimizer_config = dict(grad_clip=None)

lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=0.1,
    step=[36, 40])

log_config = dict(interval=50,
                  hooks=[
                      dict(type='TextLoggerHook'),
                      # dict(type='TensorboardLoggerHook')
                      ])

workflow = [('train', 1)]

runner = dict(type='EpochBasedRunner', max_epochs=41)

swa_optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)
swa_lr_config = dict(
    policy='cyclic',
    target_ratio=(1, 0.01),
    cyclic_times=12,
    step_ratio_up=0.0)
swa_runner = dict(type='EpochBasedRunner', max_epochs=18)

load_from = 'checkpoints/vfnet_r2_101_dcn_ms_2x_51.1.pth'

Looking for your reply, Thanks!

hyz-xmaster commented 3 years ago

Hi, you may change nms=dict(type='soft_nms', iou_threshold=0.5) to nms=dict(type='nms', iou_threshold=0.65) when doing multi-scale testing, and then see the test results. From my experience, soft_nms only brings 0.1 mAP gain when using a huge backbone.

Yangr116 commented 3 years ago

Hi, you may change nms=dict(type='soft_nms', iou_threshold=0.5) to nms=dict(type='nms', iou_threshold=0.65) when doing multi-scale testing, and then see the test results. From my experience, soft_nms only brings 0.1 mAP gain when using a huge backbone.

Thanks for your quick reply. I think that the 'soft nms' or the 'nms' may not cause a big gap but it is possible. The trained model in the first test dataset(A) obtained 58% mAP , but only obtained 52% mAP in the second test dataset(B). And I used the same model to test the second test dataset(B) again, only 41% mAP.

And I would like to know if there are dropped mAP cases randomly in your experiments.

Thanks!

hyz-xmaster commented 3 years ago

It's a bit weird to see the random drop in performance. I never experienced this in my experiments. But I saw the performance drop when I used soft-nms in multi-scale testing if the iou_threshold was not appropriately set.

Yangr116 commented 3 years ago

It's a bit weird to see the random drop in performance. I never experienced this in my experiments. But I saw the performance drop when I used soft-nms in multi-scale testing if the iou_threshold was not appropriately set.

Thanks for your reply, and thanks for your nice work again. I will conduct some experiments later.