open-mmlab / mmdetection

OpenMMLab Detection Toolbox and Benchmark
https://mmdetection.readthedocs.io
Apache License 2.0
29.61k stars 9.47k forks source link

assert img_meta.get('scale_factor') is not None AssertionError #11164

Open 99HU opened 1 year ago

99HU commented 1 year ago

Thanks for your error report and we appreciate it a lot.

Checklist

  1. I have searched related issues but cannot get the expected help.
  2. I have read the FAQ documentation but cannot get the expected help.
  3. The bug has not been fixed in the latest version.

Describe the bug A clear and concise description of what the bug is.

I try to use fasterRcnn on my dataset to Train. it can normally work when the moudle are trainning. But when it need to val and save. the bug happened. my configs is from the mmlab_guide on the bilibili. i am a beginner , any advice i will be appreciated. Reproduction

  1. What command or script did you run?
python tools\train.py moudle\fasterRcnn.py
  1. Did you make any modifications on the code or config? Did you understand what you have modified?

I commented out some data augmentation techniques to save training time. Here is my configs

`

数据集类型及路径

dataset_type = 'CocoDataset' data_root = '' metainfo = {'classes': ('paper',)} NUM_CLASSES = len(metainfo['classes'])

预训练模型权重

load_from = 'https://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_1x_coco/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth'

训练超参数

MAX_EPOCHS = 100 TRAIN_BATCH_SIZE = 4 VAL_BATCH_SIZE = 1 VAL_INTERVAL = 1 # 每隔多少轮评估保存一次模型权重 train_cfg = dict(type='EpochBasedTrainLoop', max_epochs=MAX_EPOCHS, val_interval=VAL_INTERVAL) val_cfg = dict(type='ValLoop') test_cfg = dict(type='TestLoop')

Pipeline

backend_args = None train_pipeline = [ dict(type='LoadImageFromFile', backend_args=backend_args), dict(type='LoadAnnotations', with_bbox=True), dict(type='PackDetInputs') ] test_pipeline = [ dict(type='LoadImageFromFile', backend_args=backend_args), dict(type='LoadAnnotations', with_bbox=True), dict( type='PackDetInputs', meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape', 'scale_factor')) ]

DataLoader

train_dataloader = dict( batch_size=TRAIN_BATCH_SIZE, num_workers=2, persistent_workers=True, sampler=dict(type='DefaultSampler', shuffle=True), batch_sampler=dict(type='AspectRatioBatchSampler'), dataset=dict( type=dataset_type, data_root=data_root, metainfo=metainfo, ann_file=r'E:\XuyuanFiles\csqHelpProject\keypoint_rcnn_training_pytorch\Mpaper_data_my\Mpaper_data_my\Test.json', data_prefix=dict(img=''), filter_cfg=dict(filter_empty_gt=True, min_size=32), pipeline=train_pipeline, backend_args=backend_args)) val_dataloader = dict( batch_size=VAL_BATCH_SIZE, num_workers=2, persistent_workers=True, drop_last=False, sampler=dict(type='DefaultSampler', shuffle=False), dataset=dict( type=dataset_type, metainfo=metainfo, data_root=data_root, ann_file=r'E:\XuyuanFiles\csqHelpProject\keypoint_rcnn_training_pytorch\Mpaper_data_my\Mpaper_data_my\Test.json', data_prefix=dict(img=''), test_mode=True, pipeline=test_pipeline, backend_args=backend_args)) test_dataloader = val_dataloader

Evaluator 测试集上的评估指标

val_evaluator = dict(type='CocoMetric',ann_file=data_root + 'val_coco.json',metric='bbox',format_only=False, backend_args=backend_args)

val_evaluator = dict(type='VOCMetric', metric='mAP', eval_mode='11points')

val_evaluator = [ dict(type='CocoMetric',ann_file=r'E:\XuyuanFiles\csqHelpProject\keypoint_rcnn_training_pytorch\Mpaper_data_my\Mpaper_data_my\Test.json',metric='bbox',format_only=False, backend_args=backend_args), dict(type='VOCMetric', metric='mAP', eval_mode='11points') ]

test_evaluator = val_evaluator

模型结构

model = dict( type='FasterRCNN', data_preprocessor=dict( type='DetDataPreprocessor', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], bgr_to_rgb=True, pad_size_divisor=32), backbone=dict( type='ResNet', depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, norm_cfg=dict(type='BN', requires_grad=True), norm_eval=True, style='pytorch', init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')), neck=dict( type='FPN', in_channels=[256, 512, 1024, 2048], out_channels=256, num_outs=5), rpn_head=dict( type='RPNHead', in_channels=256, feat_channels=256, anchor_generator=dict( type='AnchorGenerator', scales=[8], ratios=[0.5, 1.0, 2.0], strides=[4, 8, 16, 32, 64]), bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[.0, .0, .0, .0], target_stds=[1.0, 1.0, 1.0, 1.0]), loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), loss_bbox=dict(type='L1Loss', loss_weight=1.0)), roi_head=dict( type='StandardRoIHead', bbox_roi_extractor=dict( type='SingleRoIExtractor', roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0), out_channels=256, featmap_strides=[4, 8, 16, 32]), bbox_head=dict( type='Shared2FCBBoxHead', in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=NUM_CLASSES, bbox_coder=dict( type='DeltaXYWHBBoxCoder', target_means=[0., 0., 0., 0.], target_stds=[0.1, 0.1, 0.2, 0.2]), reg_class_agnostic=False, loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), loss_bbox=dict(type='L1Loss', loss_weight=1.0))),

model training and testing settings

train_cfg=dict(
    rpn=dict(
        assigner=dict(
            type='MaxIoUAssigner',
            pos_iou_thr=0.7,
            neg_iou_thr=0.3,
            min_pos_iou=0.3,
            match_low_quality=True,
            ignore_iof_thr=-1),
        sampler=dict(
            type='RandomSampler',
            num=256,
            pos_fraction=0.5,
            neg_pos_ub=-1,
            add_gt_as_proposals=False),
        allowed_border=-1,
        pos_weight=-1,
        debug=False),
    rpn_proposal=dict(
        nms_pre=2000,
        max_per_img=1000,
        nms=dict(type='nms', iou_threshold=0.7),
        min_bbox_size=0),
    rcnn=dict(
        assigner=dict(
            type='MaxIoUAssigner',
            pos_iou_thr=0.5,
            neg_iou_thr=0.5,
            min_pos_iou=0.5,
            match_low_quality=False,
            ignore_iof_thr=-1),
        sampler=dict(
            type='RandomSampler',
            num=512,
            pos_fraction=0.25,
            neg_pos_ub=-1,
            add_gt_as_proposals=True),
        pos_weight=-1,
        debug=False)),
test_cfg=dict(
    rpn=dict(
        nms_pre=1000,
        max_per_img=1000,
        nms=dict(type='nms', iou_threshold=0.7),
        min_bbox_size=0),
    rcnn=dict(
        score_thr=0.05,
        nms=dict(type='nms', iou_threshold=0.5),
        max_per_img=100)

))

学习率

param_scheduler = [ dict( type='LinearLR', start_factor=0.001, by_epoch=False, begin=0, end=500), dict( type='MultiStepLR', begin=0, end=12, by_epoch=True, milestones=[8, 11], gamma=0.1) ]

优化器

optim_wrapper = dict( type='OptimWrapper', optimizer=dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001))

Scaling LR automatically

auto_scale_lr = dict(enable=False, base_batch_size=16)

default_scope = 'mmdet'

Hook

default_hooks = dict( timer=dict(type='IterTimerHook'), logger=dict(type='LoggerHook', interval=1), param_scheduler=dict(type='ParamSchedulerHook'), checkpoint=dict(type='CheckpointHook', interval=1, max_keep_ckpts=2, save_best='coco/bbox_mAP'), # auto coco/bbox_mAP_50 coco/bbox_mAP_75 coco/bbox_mAP_s sampler_seed=dict(type='DistSamplerSeedHook'), visualization=dict(type='DetVisualizationHook'))

env_cfg = dict( cudnn_benchmark=False, mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0), dist_cfg=dict(backend='nccl'), )

vis_backends = [dict(type='LocalVisBackend')] visualizer = dict( type='DetLocalVisualizer', vis_backends=vis_backends, name='visualizer') log_processor = dict(type='LogProcessor', window_size=50, by_epoch=True)

log_level = 'INFO' load_from = None resume = False `

  1. What dataset did you use? I use my own dataset. { "images": [ { "id": 1, "file_name": "E:\\XuyuanFiles\\csqHelpProject\\keypoint_rcnn_training_pytorch\\Mpaper_data_my\\Mpaper_data_my\\Test\\images\\1.jpg", "width": 256, "height": 256 } ..... "annotations": [ { "id": 1, "image_id": 1, "category_id": 1, "bbox": [ 48, 6, 159, 161 ], "area": 25599, "num_keypoints": 4, "keypoints": [ 61.223915100097656, 19.399999618530273, 2, 193.72567749023438, 18.799999237060547, 2, 194.1758575439453, 154.0, 2, 60.32356262207031, 153.8000030517578, 2 ] }, { "id": 2, "image_id": 1, "category_id": 1, "bbox": [ 73, 156, 107, 67 ], "area": 7169, "num_keypoints": 4, "keypoints": [ 85.83352661132812, 168.39999389648438, 2, 166.71511840820312, 170.0, 2, 167.46542358398438, 210.60000610351562, 2, 85.53340911865234, 210.60000610351562, 2 ] } ..... ] ] "categories": [ { "supercategory": "paper", "id": 1, "name": "paper" } ], "keypoints": [ "LT", "RT", "RD", "LD" ] Environment python 3.8 mmcv 2.1.0 mmdet 3.2.0 mmengine 0.9.1
  2. Please run python mmdet/utils/collect_env.py to collect necessary environment information and paste it here.
  3. You may add addition that may be helpful for locating the problem, such as
    • How you installed PyTorch [e.g., pip, conda, source]
    • Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.)

Error traceback If applicable, paste the error trackback here.

A placeholder for trackback.

Bug fix If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!

handleandwheel commented 10 months ago

I think the problem is the augmentation code you commented out. I had a simialr bug today, and it turned out it's because I forget to add resize process.

This error should be from function _bbox_post_process in a prediction head class, and the code should looks like:

...
if rescale:
            assert img_meta.get('scale_factor') is not None
            scale_factor = [1 / s for s in img_meta['scale_factor']]
            results.bboxes = scale_boxes(results.bboxes, scale_factor)
...

It means the interpreter thinks that you have resized an image but forget to add keyword scale_factor to the meta data of an image. The keyword scale_factor is usually automatically added to the meta data when the image passing through data augmentation process like 'RandomResize', as you can see in its description:

'''
...
Required Keys:

    - img
    - gt_bboxes
    - gt_seg_map
    - gt_keypoints

    Modified Keys:

    - img
    - gt_bboxes
    - gt_seg_map
    - gt_keypoints
    - img_shape

    Added Keys:

    - scale
    - scale_factor
    - keep_ratio
...
'''

So I guess the simplest solution to the problem is to restore the augmentation code you commented, especially the resize-related augmentations. If you wish to keep the original size of your images, you can simply set the target size to be the same as you original images.

But I think there should be another way to simply remove all the resize-related problems, instead of adding a dummy augmentation, but I don't know how to do this.