open-mmlab / mmdetection

OpenMMLab Detection Toolbox and Benchmark
https://mmdetection.readthedocs.io
Apache License 2.0
29.61k stars 9.47k forks source link

When training I met with a problem: label = self.cat2label[name] KeyError: 'w' #3164

Closed Ruolingdeng closed 3 years ago

Ruolingdeng commented 4 years ago

I am using PASCAL_VOC Ddataset to do the training. When I executed the command "python tools/train.py configs/pascal_voc/faster_rcnn_r50_fpn_1x_voc0712.py --gpus 1 --work_dir merge-output" to do the training, I met with a problem: label = self.cat2label[name] KeyError: 'w'

The details of the problem is as follow:

2020-07-01 08:55:05,854 - mmdet - INFO - Environment info: sys.platform: linux Python: 3.6.10 |Anaconda, Inc.| (default, May 8 2020, 02:54:21) [GCC 7.3.0] CUDA available: True CUDA_HOME: /usr/local/cuda-9.0 NVCC: Cuda compilation tools, release 9.0, V9.0.176 GPU 0: GeForce GTX 1080 Ti GCC: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609 PyTorch: 1.1.0 PyTorch compiling details: PyTorch built with:

TorchVision: 0.3.0 OpenCV: 4.2.0 MMCV: 0.5.9 MMDetection: 1.1.0+unknown MMDetection Compiler: GCC 5.4 MMDetection CUDA Compiler: 9.0

2020-07-01 08:55:05,855 - mmdet - INFO - Distributed training: False 2020-07-01 08:55:05,855 - mmdet - INFO - Config: /home/drl/Downloads/mmdetection-1.1.0-later/configs/pascal_voc/faster_rcnn_r50_fpn_1x_voc0712.py

model settings

model = dict( type='FasterRCNN', pretrained='torchvision://resnet50', backbone=dict( type='ResNet', depth=50, num_stages=4, out_indices=(0, 1, 2, 3), frozen_stages=1, style='pytorch'), neck=dict( type='FPN', in_channels=[256, 512, 1024, 2048], out_channels=256, num_outs=5), rpn_head=dict( type='RPNHead', in_channels=256, feat_channels=256, anchor_scales=[8], anchor_ratios=[0.5, 1.0, 2.0], anchor_strides=[4, 8, 16, 32, 64], target_means=[.0, .0, .0, .0], target_stds=[1.0, 1.0, 1.0, 1.0], loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0)), bbox_roi_extractor=dict( type='SingleRoIExtractor', roi_layer=dict(type='RoIAlign', out_size=7, sample_num=2), out_channels=256, featmap_strides=[4, 8, 16, 32]), bbox_head=dict( type='SharedFCBBoxHead', num_fcs=2, in_channels=256, fc_out_channels=1024, roi_feat_size=7, num_classes=2, target_means=[0., 0., 0., 0.], target_stds=[0.1, 0.1, 0.2, 0.2], reg_class_agnostic=False, loss_cls=dict( type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0)))

model training and testing settings

train_cfg = dict( rpn=dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.7, neg_iou_thr=0.3, min_pos_iou=0.3, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=256, pos_fraction=0.5, neg_pos_ub=-1, add_gt_as_proposals=False), allowed_border=0, pos_weight=-1, debug=False), rpn_proposal=dict( nms_across_levels=False, nms_pre=2000, nms_post=2000, max_num=2000, nms_thr=0.7, min_bbox_size=0), rcnn=dict( assigner=dict( type='MaxIoUAssigner', pos_iou_thr=0.5, neg_iou_thr=0.5, min_pos_iou=0.5, ignore_iof_thr=-1), sampler=dict( type='RandomSampler', num=512, pos_fraction=0.25, neg_pos_ub=-1, add_gt_as_proposals=True), pos_weight=-1, debug=False)) test_cfg = dict( rpn=dict( nms_across_levels=False, nms_pre=1000, nms_post=1000, max_num=1000, nms_thr=0.7, min_bbox_size=0), rcnn=dict( score_thr=0.05, nms=dict(type='nms', iou_thr=0.5), max_per_img=100)

soft-nms is also supported for rcnn testing

# e.g., nms=dict(type='soft_nms', iou_thr=0.5, min_score=0.05)

)

dataset settings

dataset_type = 'VOCDataset' data_root = 'data/VOCdevkit/' img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations', with_bbox=True), dict(type='Resize', img_scale=(1000, 600), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.5), dict(type='Normalize', img_norm_cfg), dict(type='Pad', size_divisor=32), dict(type='DefaultFormatBundle'), dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), ] test_pipeline = [ dict(type='LoadImageFromFile'), dict( type='MultiScaleFlipAug', img_scale=(1000, 600), flip=False, transforms=[ dict(type='Resize', keep_ratio=True), dict(type='RandomFlip'), dict(type='Normalize', img_norm_cfg), dict(type='Pad', size_divisor=32), dict(type='ImageToTensor', keys=['img']), dict(type='Collect', keys=['img']), ]) ]

data_root + 'VOC2012/ImageSets/Main/trainval.txt'

, data_root + 'VOC2012/'

data = dict( imgs_per_gpu=2, workers_per_gpu=2, train=dict( type='RepeatDataset', times=3, dataset=dict( type=dataset_type, ann_file=[ data_root + 'VOC2007/ImageSets/Main/trainval.txt',

        ],
        img_prefix=[data_root + 'VOC2007/'],
        pipeline=train_pipeline)),
val=dict(
    type=dataset_type,
    ann_file=data_root + 'VOC2007/ImageSets/Main/test.txt',
    img_prefix=data_root + 'VOC2007/',
    pipeline=test_pipeline),
test=dict(
    type=dataset_type,
    ann_file=data_root + 'VOC2007/ImageSets/Main/test.txt',
    img_prefix=data_root + 'VOC2007/',
    pipeline=test_pipeline))

evaluation = dict(interval=1, metric='mAP')

optimizer

optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001) optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))

learning policy

lr_config = dict(policy='step', step=[3]) # actual epoch = 3 * 3 = 9 checkpoint_config = dict(interval=1)

yapf:disable

log_config = dict( interval=50, hooks=[ dict(type='TextLoggerHook'),

dict(type='TensorboardLoggerHook')

])

yapf:enable

runtime settings

total_epochs = 4 # actual epoch = 4 * 3 = 12 dist_params = dict(backend='nccl') log_level = 'INFO' work_dir = './work_dirs/faster_rcnn_r50_fpn_1x_voc0712' load_from = None resume_from = None workflow = [('train', 1)]

2020-07-01 08:55:06,110 - mmdet - INFO - load model from: torchvision://resnet50 2020-07-01 08:55:06,219 - mmdet - WARNING - The model and loaded state dict do not match exactly

unexpected key in source state_dict: fc.weight, fc.bias

2020-07-01 08:55:07,733 - mmdet - INFO - Start running, host: drl@drl-Precision, work_dir: /home/drl/Downloads/mmdetection-1.1.0-later/merge-output 2020-07-01 08:55:07,733 - mmdet - INFO - workflow: [('train', 1)], max: 4 epochs 2020-07-01 08:55:20,814 - mmdet - INFO - Epoch [1][50/1055] lr: 0.01000, eta: 0:18:10, time: 0.262, data_time: 0.044, memory: 2414, loss_rpn_cls: 0.4801, loss_rpn_bbox: 0.1136, loss_cls: 0.3689, acc: 89.5977, loss_bbox: 0.0864, loss: 1.0489 2020-07-01 08:55:31,805 - mmdet - INFO - Epoch [1][100/1055] lr: 0.01000, eta: 0:16:31, time: 0.220, data_time: 0.002, memory: 2628, loss_rpn_cls: 0.4656, loss_rpn_bbox: 0.2256, loss_cls: 1.3382, acc: 81.2793, loss_bbox: 0.3114, loss: 2.3409 2020-07-01 08:55:42,948 - mmdet - INFO - Epoch [1][150/1055] lr: 0.01000, eta: 0:15:55, time: 0.223, data_time: 0.002, memory: 2628, loss_rpn_cls: 0.2369, loss_rpn_bbox: 0.1270, loss_cls: 0.5156, acc: 83.2285, loss_bbox: 0.3364, loss: 1.2159 2020-07-01 08:55:54,391 - mmdet - INFO - Epoch [1][200/1055] lr: 0.01000, eta: 0:15:37, time: 0.229, data_time: 0.002, memory: 2975, loss_rpn_cls: 0.1499, loss_rpn_bbox: 0.1164, loss_cls: 0.3844, acc: 83.4258, loss_bbox: 0.2948, loss: 0.9455 Traceback (most recent call last): File "tools/train.py", line 142, in main() File "tools/train.py", line 138, in main meta=meta) File "/home/drl/Downloads/mmdetection-1.1.0-later/mmdet/apis/train.py", line 111, in train_detector meta=meta) File "/home/drl/Downloads/mmdetection-1.1.0-later/mmdet/apis/train.py", line 305, in _non_dist_train runner.run(data_loaders, cfg.workflow, cfg.total_epochs) File "/home/drl/anaconda3/envs/mmlab/lib/python3.6/site-packages/mmcv-0.5.9-py3.6-linux-x86_64.egg/mmcv/runner/runner.py", line 384, in run epoch_runner(data_loaders[i], **kwargs) File "/home/drl/anaconda3/envs/mmlab/lib/python3.6/site-packages/mmcv-0.5.9-py3.6-linux-x86_64.egg/mmcv/runner/runner.py", line 279, in train for i, data_batch in enumerate(data_loader): File "/home/drl/anaconda3/envs/mmlab/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 582, in next return self._process_next_batch(batch) File "/home/drl/anaconda3/envs/mmlab/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 606, in _process_next_batch raise Exception("KeyError:" + batch.exc_msg) Exception: KeyError:Traceback (most recent call last): File "/home/drl/anaconda3/envs/mmlab/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 99, in _worker_loop samples = collate_fn([dataset[i] for i in batch_indices]) File "/home/drl/anaconda3/envs/mmlab/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 99, in samples = collate_fn([dataset[i] for i in batch_indices]) File "/home/drl/Downloads/mmdetection-1.1.0-later/mmdet/datasets/dataset_wrappers.py", line 52, in getitem return self.dataset[idx % self._ori_len] File "/home/drl/anaconda3/envs/mmlab/lib/python3.6/site-packages/torch/utils/data/dataset.py", line 85, in getitem return self.datasets[dataset_idx][sample_idx] File "/home/drl/Downloads/mmdetection-1.1.0-later/mmdet/datasets/custom.py", line 132, in getitem data = self.prepare_train_img(idx) File "/home/drl/Downloads/mmdetection-1.1.0-later/mmdet/datasets/custom.py", line 140, in prepare_train_img ann_info = self.get_ann_info(idx) File "/home/drl/Downloads/mmdetection-1.1.0-later/mmdet/datasets/xml_style.py", line 47, in get_ann_info label = self.cat2label[name] KeyError: 'w'

I only have one class, and I have changed the CLASSES = ('...') to CLASSES = ('apple',) in the voc.py, But it met with this problem. Could you please help me? I am new in deep learning.

ZwwWayne commented 4 years ago

This bug should have been avoided in MMDetection V2.0, where you can directly set classes=('apple', ) in the config without modifying the code.

Ruolingdeng commented 4 years ago

@ZwwWayne But MMDetection V2.0 needs a higher version of CUDA and Cudnn, my system has other environmental settings for other models which need the version of CUDA 9.0 and Cudnn 7.5. With no changes of the version in the CUDA and CuDNN, how can I do to solve this problem, please?

hellock commented 4 years ago

@ZwwWayne But MMDetection V2.0 needs a higher version of CUDA and Cudnn, my system has other environmental settings for other models which need the version of CUDA 9.0 and Cudnn 7.5. With no changes of the version in the CUDA and CuDNN, how can I do to solve this problem, please?

You can build PyTorch from source with CUDA 9.0.

yustaub commented 4 years ago

@hellock how to build pytorch from source?plz

hellock commented 4 years ago

Please refer to the official PyTorch documentation for installation.

ZwwWayne commented 4 years ago

@ZwwWayne But MMDetection V2.0 needs a higher version of CUDA and Cudnn, my system has other environmental settings for other models which need the version of CUDA 9.0 and Cudnn 7.5. With no changes of the version in the CUDA and CuDNN, how can I do to solve this problem, please?

What is your label? From the bug it seems that variable name is a single w but should be a meaningful word.

NiuCY commented 4 years ago

Please check the syntax of tuples in python3 to know why it is should written as classes = ('name',), so that is not a bug.