open-mmlab / mmdetection

OpenMMLab Detection Toolbox and Benchmark
https://mmdetection.readthedocs.io
Apache License 2.0
29.61k stars 9.47k forks source link

KeyError when using custom pipelines #5740

Closed novice03 closed 3 years ago

novice03 commented 3 years ago

Thanks for your error report and we appreciate it a lot.

Checklist

  1. I have searched related issues but cannot get the expected help.
  2. I have read the FAQ documentation but cannot get the expected help.
  3. The bug has not been fixed in the latest version.

Describe the bug I am training detection models on a dataset from kaggle. I converted the data to COCO format and checked that there are no errors in the conversion. When I use a a pipeline that is different from the default, the validation throws an error. This error does NOT occur if I don't make any modifications to the data pipelines.

Reproduction

  1. What command or script did you run?
!python tools/train.py configs/siim/siim.py
  1. Did you make any modifications on the code or config? Did you understand what you have modified?

Yes, I created the config file as follows:

cfg = Config.fromfile('/content/mmdetection/configs/vfnet/vfnet_r50_fpn_mdconv_c3-c5_mstrain_2x_coco.py')

cfg.model.bbox_head.num_classes = 1

cfg.classes = ('opacity', )

cfg.data.samples_per_gpu = 4
cfg.data.train.img_prefix = '/content/images/train'
cfg.data.train.ann_file = '/content/annotations/train.json'
cfg.data.train.classes = cfg.classes
cfg.data.val.img_prefix = '/content/images/val'
cfg.data.val.ann_file = '/content/annotations/val.json'
cfg.data.val.classes = cfg.classes
cfg.data.test.img_prefix = '/content/images/val'
cfg.data.test.ann_file = '/content/annotations/val.json'
cfg.data.test.classes = cfg.classes
cfg.evaluation.metric = 'bbox'

img_norm_cfg = dict(
    mean=[0.0, 0.0, 0.0], std=[1.0, 1.0, 1.0], to_rgb=True)

cfg.train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(
        type='Resize',
        img_scale=[(1333, 480), (1333, 960)],
        multiscale_mode='range',
        keep_ratio=True),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
]

cfg.data.train.pipeline = cfg.train_pipeline
cfg.data.val.pipeline = cfg.train_pipeline

cfg.optimizer.lr = 0.02 / 8
cfg.lr_config = dict(
    policy = 'CosineAnnealing', 
    by_epoch = False,
    warmup = 'linear', 
    warmup_iters = 500, 
    warmup_ratio = 0.001,
    min_lr = 1e-07)

cfg.runner.max_epochs = 12
cfg.load_from = '/content/drive/MyDrive/vfnet_r50_fpn_mdconv_c3-c5_mstrain_2x_coco_20201027pth-6879c318.pth'

cfg.dump('configs/siim/siim.py')

If I understand correctly, cfg.data.train.pipeline and cfg.data.val.pipeline get changed to the train_pipeline above.

  1. What dataset did you use?

A dataset from Kaggle converted to COCO format.

Environment

  1. Please run python mmdet/utils/collect_env.py to collect necessary environment information and paste it here.
  2. You may add addition that may be helpful for locating the problem, such as
    • How you installed PyTorch [e.g., pip, conda, source]
    • Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.)
sys.platform: linux
Python: 3.7.11 (default, Jul  3 2021, 18:01:19) [GCC 7.5.0]
CUDA available: True
GPU 0: Tesla P100-PCIE-16GB
CUDA_HOME: /usr/local/cuda
NVCC: Build cuda_11.0_bu.TC445_37.28845127_0
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.9.0+cu102
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.1.2 (Git Hash 98be7e8afa711dc9b66c8ff3504129cb82013cdb)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 10.2
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70
  - CuDNN 7.6.5
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=10.2, CUDNN_VERSION=7.6.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.9.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 

TorchVision: 0.10.0+cu102
OpenCV: 4.1.2
MMCV: 1.3.9
MMCV Compiler: GCC 7.5
MMCV CUDA Compiler: 11.0
MMDetection: 2.14.0+4853ea1

Error traceback If applicable, paste the error trackback here.

[                                                  ] 0/1267, elapsed: 0s, ETA:Traceback (most recent call last):
  File "tools/train.py", line 188, in <module>
    main()
  File "tools/train.py", line 184, in main
    meta=meta)
  File "/content/mmdetection/mmdet/apis/train.py", line 170, in train_detector
    runner.run(data_loaders, cfg.workflow)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/runner/epoch_based_runner.py", line 54, in train
    self.call_hook('after_train_epoch')
  File "/usr/local/lib/python3.7/dist-packages/mmcv/runner/base_runner.py", line 307, in call_hook
    getattr(hook, fn_name)(self)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/runner/hooks/evaluation.py", line 220, in after_train_epoch
    self._do_evaluate(runner)
  File "/content/mmdetection/mmdet/core/evaluation/eval_hooks.py", line 17, in _do_evaluate
    results = single_gpu_test(runner.model, self.dataloader, show=False)
  File "/content/mmdetection/mmdet/apis/test.py", line 25, in single_gpu_test
    for i, data in enumerate(data_loader):
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 521, in __next__
    data = self._next_data()
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
    return self._process_data(data)
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
    data.reraise()
  File "/usr/local/lib/python3.7/dist-packages/torch/_utils.py", line 425, in reraise
    raise self.exc_type(msg)
KeyError: Caught KeyError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
    data = fetcher.fetch(index)
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/content/mmdetection/mmdet/datasets/custom.py", line 192, in __getitem__
    return self.prepare_test_img(idx)
  File "/content/mmdetection/mmdet/datasets/custom.py", line 235, in prepare_test_img
    return self.pipeline(results)
  File "/content/mmdetection/mmdet/datasets/pipelines/compose.py", line 40, in __call__
    data = t(data)
  File "/content/mmdetection/mmdet/datasets/pipelines/loading.py", line 370, in __call__
    results = self._load_bboxes(results)
  File "/content/mmdetection/mmdet/datasets/pipelines/loading.py", line 245, in _load_bboxes
    ann_info = results['ann_info']
KeyError: 'ann_info'

Bug fix If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!

alexzhuuuu commented 10 months ago

Thanks for your error report and we appreciate it a lot.

Checklist

  1. I have searched related issues but cannot get the expected help.
  2. I have read the FAQ documentation but cannot get the expected help.
  3. The bug has not been fixed in the latest version.

Describe the bug I am training detection models on a dataset from kaggle. I converted the data to COCO format and checked that there are no errors in the conversion. When I use a a pipeline that is different from the default, the validation throws an error. This error does NOT occur if I don't make any modifications to the data pipelines.

Reproduction

  1. What command or script did you run?
!python tools/train.py configs/siim/siim.py
  1. Did you make any modifications on the code or config? Did you understand what you have modified?

Yes, I created the config file as follows:

cfg = Config.fromfile('/content/mmdetection/configs/vfnet/vfnet_r50_fpn_mdconv_c3-c5_mstrain_2x_coco.py')

cfg.model.bbox_head.num_classes = 1

cfg.classes = ('opacity', )

cfg.data.samples_per_gpu = 4
cfg.data.train.img_prefix = '/content/images/train'
cfg.data.train.ann_file = '/content/annotations/train.json'
cfg.data.train.classes = cfg.classes
cfg.data.val.img_prefix = '/content/images/val'
cfg.data.val.ann_file = '/content/annotations/val.json'
cfg.data.val.classes = cfg.classes
cfg.data.test.img_prefix = '/content/images/val'
cfg.data.test.ann_file = '/content/annotations/val.json'
cfg.data.test.classes = cfg.classes
cfg.evaluation.metric = 'bbox'

img_norm_cfg = dict(
    mean=[0.0, 0.0, 0.0], std=[1.0, 1.0, 1.0], to_rgb=True)

cfg.train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(
        type='Resize',
        img_scale=[(1333, 480), (1333, 960)],
        multiscale_mode='range',
        keep_ratio=True),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
]

cfg.data.train.pipeline = cfg.train_pipeline
cfg.data.val.pipeline = cfg.train_pipeline

cfg.optimizer.lr = 0.02 / 8
cfg.lr_config = dict(
    policy = 'CosineAnnealing', 
    by_epoch = False,
    warmup = 'linear', 
    warmup_iters = 500, 
    warmup_ratio = 0.001,
    min_lr = 1e-07)

cfg.runner.max_epochs = 12
cfg.load_from = '/content/drive/MyDrive/vfnet_r50_fpn_mdconv_c3-c5_mstrain_2x_coco_20201027pth-6879c318.pth'

cfg.dump('configs/siim/siim.py')

If I understand correctly, cfg.data.train.pipeline and cfg.data.val.pipeline get changed to the train_pipeline above.

  1. What dataset did you use?

A dataset from Kaggle converted to COCO format.

Environment

  1. Please run python mmdet/utils/collect_env.py to collect necessary environment information and paste it here.
  2. You may add addition that may be helpful for locating the problem, such as

    • How you installed PyTorch [e.g., pip, conda, source]
    • Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.)
sys.platform: linux
Python: 3.7.11 (default, Jul  3 2021, 18:01:19) [GCC 7.5.0]
CUDA available: True
GPU 0: Tesla P100-PCIE-16GB
CUDA_HOME: /usr/local/cuda
NVCC: Build cuda_11.0_bu.TC445_37.28845127_0
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.9.0+cu102
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.1.2 (Git Hash 98be7e8afa711dc9b66c8ff3504129cb82013cdb)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 10.2
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70
  - CuDNN 7.6.5
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=10.2, CUDNN_VERSION=7.6.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.9.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 

TorchVision: 0.10.0+cu102
OpenCV: 4.1.2
MMCV: 1.3.9
MMCV Compiler: GCC 7.5
MMCV CUDA Compiler: 11.0
MMDetection: 2.14.0+4853ea1

Error traceback If applicable, paste the error trackback here.

[                                                  ] 0/1267, elapsed: 0s, ETA:Traceback (most recent call last):
  File "tools/train.py", line 188, in <module>
    main()
  File "tools/train.py", line 184, in main
    meta=meta)
  File "/content/mmdetection/mmdet/apis/train.py", line 170, in train_detector
    runner.run(data_loaders, cfg.workflow)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/runner/epoch_based_runner.py", line 54, in train
    self.call_hook('after_train_epoch')
  File "/usr/local/lib/python3.7/dist-packages/mmcv/runner/base_runner.py", line 307, in call_hook
    getattr(hook, fn_name)(self)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/runner/hooks/evaluation.py", line 220, in after_train_epoch
    self._do_evaluate(runner)
  File "/content/mmdetection/mmdet/core/evaluation/eval_hooks.py", line 17, in _do_evaluate
    results = single_gpu_test(runner.model, self.dataloader, show=False)
  File "/content/mmdetection/mmdet/apis/test.py", line 25, in single_gpu_test
    for i, data in enumerate(data_loader):
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 521, in __next__
    data = self._next_data()
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
    return self._process_data(data)
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
    data.reraise()
  File "/usr/local/lib/python3.7/dist-packages/torch/_utils.py", line 425, in reraise
    raise self.exc_type(msg)
KeyError: Caught KeyError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
    data = fetcher.fetch(index)
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/content/mmdetection/mmdet/datasets/custom.py", line 192, in __getitem__
    return self.prepare_test_img(idx)
  File "/content/mmdetection/mmdet/datasets/custom.py", line 235, in prepare_test_img
    return self.pipeline(results)
  File "/content/mmdetection/mmdet/datasets/pipelines/compose.py", line 40, in __call__
    data = t(data)
  File "/content/mmdetection/mmdet/datasets/pipelines/loading.py", line 370, in __call__
    results = self._load_bboxes(results)
  File "/content/mmdetection/mmdet/datasets/pipelines/loading.py", line 245, in _load_bboxes
    ann_info = results['ann_info']
KeyError: 'ann_info'

Bug fix If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated!

How was your problem solved? Apart from the "KeyError:'anno_info' ", there will be other key missing problems in my projects. Where should I modify them?