Open reiffd7 opened 7 months ago
Is this one custom dataset? I am facing similar issue.
I've been working on migrating an mmdetection from v2 to v3 and haven't been able reproduce the same results. Here is my configuration from v2:
model = dict( type='RetinaNet', backbone=dict( type='RegNet', arch='regnetx_400mf', out_indices=(0, 1, 2, 3), frozen_stages=1, norm_cfg=dict(type='BN', requires_grad=True), norm_eval=True, style='pytorch', init_cfg=dict( type='Pretrained', checkpoint='open-mmlab://regnetx_400mf')), neck=dict( type='FPN', in_channels=[32, 64, 160, 384], out_channels=256, start_level=1, add_extra_convs=True, num_outs=5), bbox_head=dict( type='ATSSHead', num_classes=1, in_channels=256, stacked_convs=4, feat_channels=256, norm_cfg=None, anchor_generator=dict( type='AnchorGenerator', ratios=[1.0], octave_base_scale=8, scales_per_octave=1, strides=[8, 16, 32, 64, 128]), bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), loss_cls=dict( type='FocalLoss', use_sigmoid=True, gamma=1.5, alpha=0.25, loss_weight=1.0), loss_bbox=dict(type='GIoULoss', loss_weight=2.0), loss_centerness=dict( type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0)), train_cfg=dict( assigner=dict(type='ATSSAssigner', topk=9), allowed_border=-1, pos_weight=-1, debug=False), test_cfg=dict( nms_pre=1000, min_bbox_size=16, score_thr=0.25, nms=dict(type='nms', iou_threshold=0.5), max_per_img=300)) img_norm_cfg = dict( mean=[103.53, 116.28, 123.675], std=[57.375, 57.12, 58.395], to_rgb=False) albu_train_transforms = [ dict( type="OneOf", transforms=[ dict(type="RandomResizedCrop", p=0.33, height=544, width=960, scale=(0.5, 1), ratio=(0.5, 3), interpolation=1), dict(type="RandomResizedCrop", p=0.66, height=544, width=960, scale=(1, 1), ratio=(1, 1), interpolation=1), ], p=1 ), dict( type='OneOf', transforms=[ dict(type='Rotate', p=0.33, limit=(-30, 30), interpolation=1, border_mode=2), dict(type='VerticalFlip', p=0.1), ], p=0.66), dict( type='OneOf', transforms=[ dict(type='RGBShift', p=0.5, r_shift_limit=(-30, 30), g_shift_limit=(-30, 30), b_shift_limit=(-30, 30)), dict(type='RandomBrightnessContrast', p=0.5, brightness_limit=(-0.5, 0.5), contrast_limit=(-0.33, 0.33)), dict(type='RandomGamma', p=0.5, gamma_limit=(80, 120)), dict(type='ToGray', p=0.5), ], p=0.66), dict( type='OneOf', transforms=[ dict(type='MultiplicativeNoise', p=0.5, multiplier=(0.8, 1.2)), dict(type='Spatter', p=0.5, mean=0.65, std=0.3, gauss_sigma=2, cutout_threshold=0.68, intensity=0.2, mode='rain'), dict(type='GaussNoise', p=0.5, var_limit=(10, 40), mean=0), dict(type='ISONoise', p=0.5, color_shift=(0.01, 0.05), intensity=(0.1, 0.3)), ], p=0.66), dict( type='OneOf', transforms=[ dict(type='Perspective', p=0.5, interpolation=1, keep_size=True), dict(type='MotionBlur', p=0.5, blur_limit=7), dict(type='GaussianBlur', p=0.5), dict( type='ImageCompression', p=0.33, quality_lower=50, quality_upper=100) ], p=0.5) ] data = dict( samples_per_gpu=20, workers_per_gpu=4, train=dict( type='CocoDataset', ann_file='/workspace/datasets/person_detection_03_04_2024_train/labels.json', img_prefix='/workspace/datasets/person_detection_03_04_2024_train/', pipeline=[ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict(type='RandomFlip', flip_ratio=0.5), dict( type='Resize', img_scale=[(1920, 1080), (1080, 1080),(720,720),(544,544),(960,544)], multiscale_mode='value', keep_ratio=True), dict( type='Albu', transforms=albu_train_transforms, bbox_params=dict( type='BboxParams', format='pascal_voc', label_fields=['gt_labels'], min_visibility=0.5, min_area=128, check_each_transform=True, filter_lost_elements=True), update_pad_shape=False, skip_img_without_anno=False), dict( type='Normalize', mean=[103.53, 116.28, 123.675], std=[57.375, 57.12, 58.395], to_rgb=False), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']) ], filter_empty_gt=False, classes=['person']), val=dict( type='CocoDataset', ann_file='/workspace/datasets/person_detection_03_04_2024_val/labels.json', img_prefix='/workspace/datasets/person_detection_03_04_2024_val/', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=[(960, 544)], flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict( type='Normalize', mean=[103.53, 116.28, 123.675], std=[57.375, 57.12, 58.395], to_rgb=False), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ], filter_empty_gt=False, classes=['person']), test=dict( type='CocoDataset', ann_file='/workspace/datasets/person_detection_03_04_2024_test/labels.json', img_prefix='/workspace/datasets/person_detection_03_04_2024_test/', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=[(1280, 720)], flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict( type='Normalize', mean=[103.53, 116.28, 123.675], std=[57.375, 57.12, 58.395], to_rgb=False), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ], filter_empty_gt=False, classes=['person'])) evaluation = dict(interval=5, metric=['bbox'], classwise=True) optimizer = dict( type='AdamW', lr=0.0001, weight_decay=0.05, paramwise_cfg=dict(norm_decay_mult=0., bypass_duplicate=True)) optimizer_config = dict( grad_clip=dict(max_norm=25, norm_type=2)) lr_config = dict( policy='cyclic', target_ratio=(2.0, 1.0), cyclic_times=1, step_ratio_up=0.2, gamma=0.6, warmup='linear', warmup_iters=50, warmup_ratio=0.1) momentum_config = dict( policy='cyclic', target_ratio=(0.9, 1), cyclic_times=1, step_ratio_up=0.3) runner = dict(type='EpochBasedRunner', max_epochs=64) checkpoint_config = dict(interval=1) log_config = dict( interval=1, hooks=[ dict(type='TextLoggerHook'), dict(type='TensorboardLoggerHook'), ]) resume_from = None load_from = None custom_hooks = [ dict(type='NaNHook', interval=1), dict( type='ExpMomentumEMAHook', resume_from=resume_from, momentum=0.0001, priority=49) ] dist_params = dict(backend='nccl') log_level = 'INFO' workflow = [('train', 5), ('val', 1)] opencv_num_threads = 0 mp_start_method = 'fork' base_batch_size = 110 auto_scale_lr = dict(enable=False, base_batch_size=base_batch_size) custom_imports = dict( imports=['mmdet.core.utils.nan_hook'], allow_failed_imports=False) seed = 0 auto_resume = False classes = ['person'] train_dataset = dict( fiftyone_dataset_name='person_detection_03_04_2024_train', train_data_config=dict( type='CocoDataset', img_prefix='/workspace/datasets/person_detection_03_04_2024_train/', ann_file='/workspace/datasets/person_detection_03_04_2024_train/labels.json', classes=classes, pipeline=[ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict(type='RandomFlip', flip_ratio=0.5), dict( type='Resize', img_scale=[(1920, 1080), (1080, 1080),(720,720),(544,544),(960,544)], multiscale_mode='value', keep_ratio=True), dict( type='Albu', transforms=albu_train_transforms), dict( type='Normalize', mean=[103.53, 116.28, 123.675], std=[57.375, 57.12, 58.395], to_rgb=False), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']) ], filter_empty_gt=False)) val_dataset = dict( fiftyone_dataset_name='person_detection_03_04_2024_val', val_data_config=dict( type='CocoDataset', img_prefix='/workspace/datasets/person_detection_03_04_2024_val/', ann_file='/workspace/datasets/person_detection_03_04_2024_val/labels.json', classes=classes, pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=[(960, 544)], flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict( type='Normalize', mean=[103.53, 116.28, 123.675], std=[57.375, 57.12, 58.395], to_rgb=False), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ], filter_empty_gt=False, )) test_dataset = dict( fiftyone_dataset_name='person_detection_03_04_2024_test', test_data_config=dict( type='CocoDataset', img_prefix='/workspace/datasets/person_detection_03_04_2024_test/', ann_file='/workspace/datasets/person_detection_03_04_2024_test/labels.json', classes=classes, pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=[(1280, 720)], flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict( type='Normalize', mean=[103.53, 116.28, 123.675], std=[57.375, 57.12, 58.395], to_rgb=False), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']) ]) ], filter_empty_gt=False, )) work_dir = './work_dirs/retinanet_regnetx_800mf_fpn_1x8_1x_person_collection' gpu_ids = [0] fp16 = dict(loss_scale='dynamic')
Using this configuration I was able to achieve nearly 70% mAP@0.50.
Next, I replicated this experiment using v3:
model = dict( type='RetinaNet', data_preprocessor=dict( type='DetDataPreprocessor', mean=[103.53, 116.28, 123.675], std=[57.375, 57.12, 58.395], bgr_to_rgb=False, pad_size_divisor=32), backbone=dict( type='RegNet', arch='regnetx_400mf', out_indices=(0, 1, 2, 3), frozen_stages=1, norm_cfg=dict(type='BN', requires_grad=True), norm_eval=True, style='pytorch', init_cfg=dict( type='Pretrained', checkpoint='open-mmlab://regnetx_400mf') ), neck=dict( type='FPN', in_channels=[32, 64, 160, 384], out_channels=256, start_level=1, add_extra_convs=True, num_outs=5), bbox_head=dict( type='ATSSHead', num_classes=1, in_channels=256, stacked_convs=4, feat_channels=256, norm_cfg=None, anchor_generator=dict( type='AnchorGenerator', ratios=[1.0], octave_base_scale=8, scales_per_octave=1, strides=[8, 16, 32, 64, 128]), bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[0.1, 0.1, 0.2, 0.2]), loss_cls=dict( type='FocalLoss', use_sigmoid=True, gamma=1.5, alpha=0.25, loss_weight=1.0), loss_bbox=dict(type='GIoULoss', loss_weight=2.0), loss_centerness=dict( type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0)), # model training and testing settings train_cfg=dict( assigner=dict(type='ATSSAssigner', topk=9), allowed_border=-1, pos_weight=-1, debug=False), test_cfg=dict( nms_pre=1000, min_bbox_size=16, score_thr=0.25, nms=dict(type='nms', iou_threshold=0.5), max_per_img=300)) # dataset settings dataset_type = 'CocoDataset' data_root = '/home/ubuntu/mmdetection-v3/' backend_args = None albu_train_transforms = [ dict( type="OneOf", transforms=[ dict(type="RandomResizedCrop", p=0.33, height=544, width=960, scale=(0.5, 1), ratio=(0.5, 3), interpolation=1), dict(type="RandomResizedCrop", p=0.66, height=544, width=960, scale=(1, 1), ratio=(1, 1), interpolation=1), ], p=1 ), dict( type='OneOf', transforms=[ dict(type='Rotate', p=0.33, limit=(-30, 30), interpolation=1, border_mode=2), dict(type='VerticalFlip', p=0.1), ], p=0.66), dict( type='OneOf', transforms=[ dict(type='RGBShift', p=0.5, r_shift_limit=(-30, 30), g_shift_limit=(-30, 30), b_shift_limit=(-30, 30)), dict(type='RandomBrightnessContrast', p=0.5, brightness_limit=(-0.5, 0.5), contrast_limit=(-0.33, 0.33)), dict(type='RandomGamma', p=0.5, gamma_limit=(80, 120)), dict(type='ToGray', p=0.5), ], p=0.66), dict( type='OneOf', transforms=[ dict(type='MultiplicativeNoise', p=0.5, multiplier=(0.8, 1.2)), dict(type='Spatter', p=0.5, mean=0.65, std=0.3, gauss_sigma=2, cutout_threshold=0.68, intensity=0.2, mode='rain'), dict(type='GaussNoise', p=0.5, var_limit=(10, 40), mean=0), dict(type='ISONoise', p=0.5, color_shift=(0.01, 0.05), intensity=(0.1, 0.3)), ], p=0.66), dict( type='OneOf', transforms=[ dict(type='Perspective', p=0.5, interpolation=1, keep_size=True), dict(type='MotionBlur', p=0.5, blur_limit=7), dict(type='GaussianBlur', p=0.5), dict( type='ImageCompression', p=0.33, quality_lower=50, quality_upper=100) ], p=0.5) ] train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict(type='RandomFlip', prob=0.5), dict( type='RandomChoiceResize', scales=[(1920, 1080), (1080, 1080),(720,720),(544,544),(960,544)], keep_ratio=True), dict( type='Albu', transforms=albu_train_transforms, bbox_params=dict( type='BboxParams', format='pascal_voc', label_fields=['gt_bboxes_labels', 'gt_ignore_flags'], min_visibility=0.5, min_area=128, check_each_transform=True, ), keymap={ 'img': 'image', 'gt_bboxes': 'bboxes' }, # update_pad_shape=False, skip_img_without_anno=True), dict( type='PackDetInputs') ] test_pipeline = [ dict(type='LoadImageFromFile', backend_args=backend_args), dict(type='Resize', scale=(960, 544), keep_ratio=True), dict(type='LoadAnnotations', with_bbox=True), dict(type='Pad', size_divisor=32), dict( type='PackDetInputs', meta_keys=('img_path', 'img_id', 'seg_map_path', 'height', 'width', 'instances', 'sample_idx', 'img', 'img_shape', 'ori_shape', 'scale', 'scale_factor', 'keep_ratio', 'homography_matrix', 'gt_bboxes', 'gt_ignore_flags', 'gt_bboxes_labels')) ] batch_size = 20 num_workers = 4 train_dataloader = dict( batch_size=batch_size, num_workers=num_workers, persistent_workers=True, sampler=dict(type='DefaultSampler', shuffle=True), batch_sampler=dict(type='AspectRatioBatchSampler'), dataset=dict( type=dataset_type, data_root=data_root, ann_file='data/person_detection_03_04_2024_train/labels.json', data_prefix=dict( img='data/person_detection_03_04_2024_train/data/'), filter_cfg=dict(filter_empty_gt=False, min_size=32), pipeline=train_pipeline, backend_args=backend_args)) val_dataloader = dict( batch_size=batch_size, num_workers=num_workers, persistent_workers=True, drop_last=False, sampler=dict(type='DefaultSampler', shuffle=False), dataset=dict( type=dataset_type, data_root=data_root, ann_file='data/person_detection_03_04_2024_val/labels.json', data_prefix=dict( img='data/person_detection_03_04_2024_val/data/'), test_mode=True, pipeline=test_pipeline, backend_args=backend_args)) test_dataloader = val_dataloader val_evaluator = dict( type='CocoMetric', ann_file='data/person_detection_03_04_2024_val/labels.json', metric=['bbox'], format_only=False, backend_args=backend_args) test_evaluator = val_evaluator train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=64, val_interval=5) val_cfg = dict(type='ValLoop') test_cfg = dict(type='TestLoop') optim_wrapper = dict( type='OptimWrapper', optimizer=dict( type='AdamW', lr=0.0002, weight_decay=0.05, eps=1e-8, betas=(0.9, 0.999)), clip_grad=dict(max_norm=25, norm_type=2), paramwise_cfg=dict( custom_keys={ 'backbone': dict(lr_mult=0.1, decay_mult=1.0), }, norm_decay_mult=0.0), ) param_scheduler = dict( type="OneCycleLR", eta_max=0.0002, pct_start=0.2, div_factor=2, by_epoch=False ) default_scope = 'mmdet' default_hooks = dict( timer=dict(type='IterTimerHook'), logger=dict(type='LoggerHook', interval=50), param_scheduler=dict(type='ParamSchedulerHook'), checkpoint=dict(type='CheckpointHook', interval=2), sampler_seed=dict(type='DistSamplerSeedHook'), visualization=dict(type='DetVisualizationHook', draw=True, interval=5, show=False) ) custom_hooks = [ dict(type='CheckInvalidLossHook', interval=50, priority='VERY_LOW'), ] env_cfg = dict( cudnn_benchmark=False, mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0), dist_cfg=dict(backend='nccl'), ) vis_backends = [ dict(type='LocalVisBackend'), dict(type='TensorboardVisBackend'), ] visualizer = dict( type='DetLocalVisualizer', vis_backends=vis_backends, name='visualizer', save_dir='data/logs/') log_processor = dict(type='LogProcessor', window_size=50, by_epoch=True) log_level = 'INFO' load_from = None resume = False gpu_ids = [0] fp16 = dict(loss_scale='dynamic')
With this configuration, I achieved 60% mAP@0.50.
I believe I have correctly migrated aspects of the configuration according to https://mmdetection.readthedocs.io/en/latest/migration/config_migration.html such as
* image normalization * data transformations like the RandomChoiceResize operation * optimizer
Something I find curious is that the v2 experiment mAP immediately (after 5 epochs) very high while in the v3 experiment it starts very low and gradually improves. I thought that this could be due to issues reading the checkpoint file in v3. I tried manually downloading the checkpoint file and replacing the checkpoint url with my local filepath but this didn't change anything. Additionally, I removed the checkpoint file entirely and the performance was significantly worse so I don't think this is causing the issue.
Has anyone else experienced issues replicating performance while migrating from v2 to v3? Any help would be greatly appreciated!!
If you have found the reason or any potential solution, I would greatly appreciate your help!
Is this one custom dataset? I am facing similar issue.
Why does this phenomenon occur? Have you found the reason? If you could tell me, I would be extremely grateful
@CFZ1 I had this issue with the VOC dataset and it was because of boxes getting filtered based on minimum size.
https://github.com/open-mmlab/mmdetection/issues/10502#issuecomment-1593020683
@CFZ1 I had this issue with the VOC dataset and it was because of boxes getting filtered based on minimum size.
Thanks a lot.
I've been working on migrating an mmdetection from v2 to v3 and haven't been able reproduce the same results. Here is my configuration from v2:
Using this configuration I was able to achieve nearly 70% mAP@0.50.
Next, I replicated this experiment using v3:
With this configuration, I achieved 60% mAP@0.50.
I believe I have correctly migrated aspects of the configuration according to https://mmdetection.readthedocs.io/en/latest/migration/config_migration.html such as
Something I find curious is that the v2 experiment mAP immediately (after 5 epochs) very high while in the v3 experiment it starts very low and gradually improves. I thought that this could be due to issues reading the checkpoint file in v3. I tried manually downloading the checkpoint file and replacing the checkpoint url with my local filepath but this didn't change anything. Additionally, I removed the checkpoint file entirely and the performance was significantly worse so I don't think this is causing the issue.
Has anyone else experienced issues replicating performance while migrating from v2 to v3? Any help would be greatly appreciated!!