Closed Hbenbenyu closed 3 years ago
Hi, Firstly, you can try to use val dataset to test your code and make sure your inference code is normal. I haven't encountered this problem, so you may need to provide more details
Ok, thank you for your answer. How to test the inference code? Where are the relevant codes used to calculate the mAP? What details do I need to provide you?
Thank you! I will try to test it with your approach. I have ever uesd the demo code to visualize the test set, the result was not good, but I will try your approach. The test mAP has been about 0.7 because I changed the ratio and strides to the initial ones, but it is still much lower than the val mAP. My config files are shwon as:
base = [ '../base/models/faster_rcnn_r50_fpn.py', '../base/datasets/voc0712.py', '../base/default_runtime.py' ] model = dict(roi_head=dict(bbox_head=dict(num_classes=3))) optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001) optimizer_config = dict(grad_clip=None) lr_config = dict(policy='step', step=[20]) runner = dict( type='EpochBasedRunner', max_epochs=45)
model = dict( type='FasterRCNN', backbone=dict( type='ResNet', depth=50, num_stages=4, out_indices=(0, 1, 2, 3), # 输出全部四个残差模块的特征图都输出 frozen_stages=1, # 训练时固定哪些参数,第一个卷积及以下层不训练,用于fine-tune norm_cfg=dict(type='BN', requires_grad=True), norm_eval=True, style='pytorch', init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), neck=dict( type='FPN', in_channels=[256, 512, 1024, 2048], # resnet的四层的大小 out_channels=256, num_outs=5), rpn_head=dict( type='RPNHead', in_channels=256, # 输入FPN生成的5层特征图,并生成提议框 feat_channels=256, anchor_generator=dict( type='AnchorGenerator', scales=[8], # 基本尺寸 ratios=[0.5, 1.0, 2.0], # 修改ratio,长宽比 strides=[4, 8, 16, 32, 64]), # 此处对应的是anchor的尺寸 bbox_coder=dict( type='DeltaXYWHBBoxCoder', # 对边界框回归的目标值编码 target_means=[.0, .0, .0, .0], target_stds=[1.0, 1.0, 1.0, 1.0]), loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), # 分类使用CrossEntropyLoss loss_bbox=dict(type='L1Loss', loss_weight=1.0)), # 回归使用L1Loss roi_head=dict( # 基于RPNhead产生的提议框和原图特征图进行预测 type='StandardRoIHead', bbox_roi_extractor=dict( # 把提议框区域内特征图从全图特征图中裁剪下来 type='SingleRoIExtractor', roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0), out_channels=256, featmap_strides=[4, 8, 16, 32]), bbox_head=dict( type='Shared2FCBBoxHead', in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=80, bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0., 0., 0., 0.], target_stds=[0.1, 0.1, 0.2, 0.2]), reg_class_agnostic=False, loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), loss_bbox=dict(type='L1Loss', loss_weight=1.0))), train_cfg=dict( rpn=dict( assigner=dict( type='MaxIoUAssigner', # 基于IOU的 pos_iou_thr=0.7, neg_iou_thr=0.3, min_pos_iou=0.3, match_low_quality=True, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), allowed_border=-1, pos_weight=-1, debug=False), rpn_proposal=dict( nms_pre=2000, max_per_img=1000, nms=dict(type='nms', iou_threshold=0.7), min_bbox_size=0), rcnn=dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0.5, match_low_quality=False, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), pos_weight=-1, debug=False)), test_cfg=dict( rpn=dict( nms_pre=1000, max_per_img=1000, nms=dict(type='nms', iou_threshold=0.7), min_bbox_size=0), rcnn=dict( score_thr=0.05, nms=dict(type='nms', iou_threshold=0.5), max_per_img=100) ))
dataset_type = 'VOCDataset' data_root = 'data/VOCdevkit/' img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict(type='Resize', img_scale=(1000, 600), keep_ratio=True), dict(type='Normalize', **img_norm_cfg),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
] test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1000, 600), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict(type='Normalize', **img_norm_cfg),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img']),
])
] data = dict( samples_per_gpu=8, workers_per_gpu=3, train=dict( type='RepeatDataset', times=3, dataset=dict( type=dataset_type, ann_file=[ data_root + 'VOC2007/ImageSets/Main/trainval.txt',
],
img_prefix=[data_root + 'VOC2007/'],
pipeline=train_pipeline)),
val=dict(
type=dataset_type,
ann_file=data_root + 'VOC2007/ImageSets/Main/val.txt',
img_prefix=data_root + 'VOC2007/',
pipeline=test_pipeline),
test=dict(
type=dataset_type,
ann_file=data_root + 'VOC2007/ImageSets/Main/test.txt',
img_prefix=data_root + 'VOC2007/',
pipeline=test_pipeline))
evaluation = dict(interval=5, metric='mAP')
I have tried the val data, and the mAP is 1, as same as the mAP during the training. Does this prove the code to be normal? How can I modify the codes to improve the test mAP?
Hi, Did you use the original VOC dataset, or use your own dataset? It seems that the model is overfitting. Try to use some data augment to avoid this
Hi, thank you for your help. My dataset is small, I will try some data augmentation methods.
When I train the faster_rcnn_r50_fp or ssd300, the val mAP is very high which reaches 1 and the loss is low, but when I test the model, the mAP is just about 0.4. I have ever trained the same voc dataset with https://github.com/endernewton/tf-faster-rcnn and the test mAP is over 0.85. I can not figure out this problem.This problem bothers me and I hope to get your help.