shaunyuan22 / CFINet

The official implementation for ICCV'23 paper "Small Object Detection via Coarse-to-fine Proposal Generation and Imitation Learning"
Apache License 2.0
128 stars 7 forks source link

[Reimplementation] something wrong when I tried to train FCOS on SODA-D #32

Closed CheerM closed 2 months ago

CheerM commented 3 months ago

Prerequisite

💬 Describe the reimplementation questions

I tried to run this:

CUDA_VISIBLE_DEVICES=1 python CFINet-master/tools/train.py \ CFINet-master/configs/sodad-benchmarks/fcos_center-normbbox-centeronreg-giou_r50_caffe_fpn_gn-head_1x.py \ --cfg-options work_dir=$SAVE_DIR/fcos/fcos_r50_fpn_1x

then got:

File "CFINet-master/mmdet/models/dense_heads/fcos_head.py", line 288, in get_targets assert len(points) == len(self.regress_ranges) AssertionError

Environment

mmdet 2.26.0 mmcv 1.5.0 python 3.8 pytorch 1.10.0

Expected results

No response

Additional information

1) The dataset soda-d was processed step by step, as shown at readme.md

2) modify the path/to/dataset in config files; keep others the same as latest repo

3) What should I do to reproduce the results of FCOS on SODA-D? A timely reply would be appreciated!

shaunyuan22 commented 3 months ago

seems that the number of feature maps used for regressing does not align with that of regress_ranges, could you please show the training config if available?

CheerM commented 3 months ago

seems that the number of feature maps used for regressing does not align with that of regress_ranges, could you please show the training config if available?

Sure thing, here is the cfg for fcos

dataset_type = 'SODADDataset' data_root = '/data1/datasets/SODA/SODA-D/' img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict(type='Resize', img_scale=(1200, 1200), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.5), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']) ] test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1200, 1200), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img']) ]) ] data = dict( samples_per_gpu=2, workers_per_gpu=2, train=dict( type='SODADDataset', ann_file='/data1/datasets/SODA/SODA-D/divData/Annotations/train.json', img_prefix='/data1/datasets/SODA/SODA-D/divData/Images/train/', pipeline=[ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict(type='Resize', img_scale=(1200, 1200), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.5), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']) ], ori_ann_file= '/data1/datasets/SODA/SODA-D/rawData/Annotations/train.json'), val=dict( type='SODADDataset', ann_file='/data1/datasets/SODA/SODA-D/divData/Annotations/val.json', img_prefix='/data1/datasets/SODA/SODA-D/divData/Images/val/', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1200, 1200), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img']) ]) ], ori_ann_file= '/data1/datasets/SODA/SODA-D/rawData/Annotations/val_wo_ignore.json'), test=dict( type='SODADDataset', ann_file='/data1/datasets/SODA/SODA-D/divData/Annotations/test.json', img_prefix='/data1/datasets/SODA/SODA-D/divData/Images/test/', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1200, 1200), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img']) ]) ], ori_ann_file= '/data1/datasets/SODA/SODA-D/rawData/Annotations/test_wo_ignore.json')) optimizer = dict( type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001, paramwise_cfg=dict(bias_lr_mult=2.0, bias_decay_mult=0.0)) optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) lr_config = dict( policy='step', warmup='linear', warmup_iters=500, warmup_ratio=0.001, step=[8, 11]) runner = dict(type='EpochBasedRunner', max_epochs=12) checkpoint_config = dict(interval=1) log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')]) custom_hooks = [dict(type='NumClassCheckHook')] dist_params = dict(backend='nccl') log_level = 'INFO' load_from = None resume_from = None workflow = [('train', 1)] opencv_num_threads = 0 mp_start_method = 'fork' auto_scale_lr = dict(enable=False, base_batch_size=16) model = dict( type='FCOS', backbone=dict( type='ResNet', depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, norm_cfg=dict(type='BN', requires_grad=False), norm_eval=True, style='caffe', init_cfg=dict( type='Pretrained', checkpoint='open-mmlab://detectron2/resnet50_caffe')), neck=dict( type='FPN', in_channels=[256, 512, 1024, 2048], out_channels=256, start_level=1, add_extra_convs='on_output', num_outs=4, relu_before_extra_convs=True), bbox_head=dict( type='FCOSHead', num_classes=9, in_channels=256, stacked_convs=4, feat_channels=256, strides=[8, 16, 32, 64], norm_on_bbox=True, centerness_on_reg=True, dcn_on_last_conv=False, center_sampling=True, conv_bias=True, loss_cls=dict( type='FocalLoss', use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), loss_bbox=dict(type='GIoULoss', loss_weight=1.0), loss_centerness=dict( type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0)), train_cfg=dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.5, neg_iou_thr=0.4, min_pos_iou=0, ignore_iof_thr=-1), allowed_border=-1, pos_weight=-1, debug=False), test_cfg=dict( nms_pre=1000, min_bbox_size=0, score_thr=0.05, nms=dict(type='nms', iou_threshold=0.6), max_per_img=100)) work_dir = '../soda_d_results_mmdet2/fcos/fcos_r50_fpn_1x' auto_resume = False gpu_ids = [0]

ALSO, other issues like loss turn into NAN were found during training retinanet and reppoint. Hence, the cfg for training retinanet is also showed below,

model = dict( type='RetinaNet', backbone=dict( type='ResNet', depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, norm_cfg=dict(type='BN', requires_grad=True), norm_eval=True, style='pytorch', init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), neck=dict( type='FPN', in_channels=[256, 512, 1024, 2048], out_channels=256, start_level=1, add_extra_convs='on_input', num_outs=4), bbox_head=dict( type='RetinaHead', num_classes=9, in_channels=256, stacked_convs=4, feat_channels=256, anchor_generator=dict( type='AnchorGenerator', octave_base_scale=2, scales_per_octave=3, ratios=[0.5, 1.0, 2.0], strides=[8, 16, 32, 64]), bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0.0, 0.0, 0.0, 0.0], target_stds=[1.0, 1.0, 1.0, 1.0]), loss_cls=dict( type='FocalLoss', use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=1.0), loss_bbox=dict(type='L1Loss', loss_weight=1.0)), train_cfg=dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.5, neg_iou_thr=0.4, min_pos_iou=0, ignore_iof_thr=-1), allowed_border=-1, pos_weight=-1, debug=False), test_cfg=dict( nms_pre=1000, min_bbox_size=0, score_thr=0.05, nms=dict(type='nms', iou_threshold=0.5), max_per_img=100)) dataset_type = 'SODADDataset' data_root = '/data1/datasets/SODA/SODA-D/' img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict(type='Resize', img_scale=(1200, 1200), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.5), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']) ] test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1200, 1200), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img']) ]) ] data = dict( samples_per_gpu=8, workers_per_gpu=2, train=dict( type='SODADDataset', ann_file='/data1/datasets/SODA/SODA-D/divData/Annotations/train.json', img_prefix='/data1/datasets/SODA/SODA-D/divData/Images/train/', pipeline=[ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict(type='Resize', img_scale=(1200, 1200), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.5), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']) ], ori_ann_file= '/data1/datasets/SODA/SODA-D/rawData/Annotations/train.json'), val=dict( type='SODADDataset', ann_file='/data1/datasets/SODA/SODA-D/divData/Annotations/val.json', img_prefix='/data1/datasets/SODA/SODA-D/divData/Images/val/', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1200, 1200), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img']) ]) ], ori_ann_file= '/data1/datasets/SODA/SODA-D/rawData/Annotations/val_wo_ignore.json'), test=dict( type='SODADDataset', ann_file='/data1/datasets/SODA/SODA-D/divData/Annotations/test.json', img_prefix='/data1/datasets/SODA/SODA-D/divData/Images/test/', pipeline=[ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1200, 1200), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img']) ]) ], ori_ann_file= '/data1/datasets/SODA/SODA-D/rawData/Annotations/test_wo_ignore.json')) optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001) optimizer_config = dict(grad_clip=None) lr_config = dict( policy='step', warmup='linear', warmup_iters=500, warmup_ratio=0.001, step=[8, 11]) runner = dict(type='EpochBasedRunner', max_epochs=1000) checkpoint_config = dict(interval=1) log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')]) custom_hooks = [dict(type='NumClassCheckHook')] dist_params = dict(backend='nccl') log_level = 'INFO' load_from = None resume_from = None workflow = [('train', 1)] opencv_num_threads = 0 mp_start_method = 'fork' auto_scale_lr = dict(enable=False, base_batch_size=16) work_dir = '../soda_d_results_mmdet2/retinanet/retinanet_r50_fpn_1x' auto_resume = False gpu_ids = [0]

thank you for your reply

Actually, I'm so confused... coz all things were simply follow readme.md, like copied repo, installed corresponding envs etc., there is no major change on code, and results still far from correct

shaunyuan22 commented 3 months ago

for fcos, the default number of regress_ranges is 5 which is not aligned with the fpn output features in your config namely 4, see https://github.com/shaunyuan22/CFINet/blob/2167eebdc420046165e78a7555e40b044355cb58/mmdet/models/dense_heads/fcos_head.py#L63

for retinanet, you could increase warmup_iters cause single-stage method is unstable during training.