open-mmlab / mmdetection

OpenMMLab Detection Toolbox and Benchmark
https://mmdetection.readthedocs.io
Apache License 2.0
29.56k stars 9.46k forks source link

[Bug] cityscapes instance/detection evaluation stage: `TypeError: Unsupported format: True` #9329

Closed sjiang95 closed 1 year ago

sjiang95 commented 1 year ago

Prerequisite

Task

I'm using the official example scripts/configs for the officially supported tasks/models/datasets.

Branch

master branch https://github.com/open-mmlab/mmdetection

Environment

env0

sys.platform: linux
Python: 3.7.15 (default, Nov  7 2022, 22:00:21) [GCC 11.2.0]
CUDA available: True
GPU 0: Quadro P620
CUDA_HOME: None
GCC: gcc (Ubuntu 11.2.0-19ubuntu1) 11.2.0
PyTorch: 1.11.0
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.5.2 (Git Hash a9302535553c73243c632ad3c4c80beec3d19a1e)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.3
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.2
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

TorchVision: 0.12.0
OpenCV: 4.6.0
MMCV: 1.5.3
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 11.3
MMDetection: 2.25.3+a8a8373

limited by insufficient GPU memory, img resolution is tuned to

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
    dict(
        type='Resize', img_scale=[(1024, 400), (1024, 512)], keep_ratio=True),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks']),
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1024, 512),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Pad', size_divisor=32),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img']),
        ])
]

env1

sys.platform: linux
Python: 3.8.15 (default, Nov  4 2022, 20:59:55) [GCC 11.2.0]
CUDA available: True
GPU 0,1: NVIDIA GeForce RTX 3090
CUDA_HOME: None
GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
PyTorch: 1.11.0
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v2.5.2 (Git Hash a9302535553c73243c632ad3c4c80beec3d19a1e)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 11.3
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_37,code=compute_37
  - CuDNN 8.2
  - Magma 2.5.2
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.2.0, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

TorchVision: 0.12.0
OpenCV: 4.5.4
MMCV: 1.5.3
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 11.3
MMDetection: 2.25.3+a8a8373

On both env0&1, pytorch is installed by

conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch

Environment variables on env1:

$PATH

$ echo $PATH
/home/quan/miniconda3/envs/mmdet/bin:/home/quan/miniconda3/condabin:/home/quan/.vscode-server/bin/6261075646f055b99068d3688932416f2346dd3b/bin/remote-cli:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin

$LD_LIBRARY_PATH

$ echo $LD_LIBRARY_PATH
/home/quan/miniconda3/envs/torch/x86_64-conda-linux-gnu/sysroot:

$PYTHONPATH

$ echo $PYTHONPATH
(empty)

Reproduces the problem - code sample

The cityscapes dataset is converted by 1: Inference and train with existing models and standard datasets — MMDetection 2.25.1 documentation

pip install cityscapesscripts

python tools/dataset_converters/cityscapes.py \
    ./data/cityscapes \
    --nproc 8 \
    --out-dir ./data/cityscapes/annotations

Single gpu (on env0):

python tools/train.py configs/cityscapes/faster_rcnn_r50_fpn_1x_cityscapes.py

Distributed (on env1):

bash tools/dist_train.sh configs/cityscapes/faster_rcnn_r50_fpn_1x_cityscapes.py 2

Reproduces the problem - command or script

same as above

Reproduces the problem - error message

The error messages are same on env0&1.

2022-11-16 07:09:43,816 - mmdet - INFO - Saving checkpoint at 1 iterations
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 500/500, 0.9 task/s, elapsed: 548s, ETA:     0sloading annotations into memory...
Done (t=0.04s)
creating index...
index created!
Traceback (most recent call last):
  File "tools/train.py", line 244, in <module>
    main()
  File "tools/train.py", line 240, in main
    meta=meta)
  File "/home/quan/Documents/my_repo/semclDetection/mmdet/apis/train.py", line 244, in train_detector
    runner.run(data_loaders, cfg.workflow)
  File "/home/quan/miniconda3/envs/mmdet/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 138, in run
    iter_runner(iter_loaders[i], **kwargs)
  File "/home/quan/miniconda3/envs/mmdet/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 68, in train
    self.call_hook('after_train_iter')
  File "/home/quan/miniconda3/envs/mmdet/lib/python3.7/site-packages/mmcv/runner/base_runner.py", line 309, in call_hook
    getattr(hook, fn_name)(self)
  File "/home/quan/miniconda3/envs/mmdet/lib/python3.7/site-packages/mmcv/runner/hooks/evaluation.py", line 262, in after_train_iter
    self._do_evaluate(runner)
  File "/home/quan/Documents/my_repo/semclDetection/mmdet/core/evaluation/eval_hooks.py", line 63, in _do_evaluate
    key_score = self.evaluate(runner, results)
  File "/home/quan/miniconda3/envs/mmdet/lib/python3.7/site-packages/mmcv/runner/hooks/evaluation.py", line 364, in evaluate
    results, logger=runner.logger, **self.eval_kwargs)
  File "/home/quan/Documents/my_repo/semclDetection/mmdet/datasets/cityscapes.py", line 267, in evaluate
    self.test_mode, self.filter_empty_gt)
  File "/home/quan/Documents/my_repo/semclDetection/mmdet/datasets/custom.py", line 110, in __init__
    self.proposals = self.load_proposals(local_path)
  File "/home/quan/Documents/my_repo/semclDetection/mmdet/datasets/custom.py", line 143, in load_proposals
    return mmcv.load(proposal_file)
  File "/home/quan/miniconda3/envs/mmdet/lib/python3.7/site-packages/mmcv/fileio/io.py", line 57, in load
    raise TypeError(f'Unsupported format: {file_format}')
TypeError: Unsupported format: True

Additional information

Training and evaluation on voc and coco are totally fine. For cityscapes, everything works like a charm in training stage, so I believe the dataset conversion has nothing to do with the error.

sjiang95 commented 1 year ago

Reason found.

In PR #9078, a new entry seg_suffix is added for class CustomDataset. mmdetection/mmdet/datasets/custom.py at e71b499608e9c3ccd4211e7c815fa20eeedf18a2 · open-mmlab/mmdetection

def __init__(self,
                 ann_file,
                 pipeline,
                 classes=None,
                 data_root=None,
                 img_prefix='',
                 seg_prefix=None,
                 seg_suffix='.png', # new entry
                 proposal_file=None,
                 test_mode=False,
                 filter_empty_gt=True,
                 file_client_args=dict(backend='disk'))

But in mmdet/datasets/cityscapes.py#L264, the initialization of class CocoDataset remains.

self_coco = CocoDataset(self.ann_file, self.pipeline.transforms,
                                    None, self.data_root, self.img_prefix,
                                    self.seg_prefix, self.proposal_file,
                                    self.test_mode, self.filter_empty_gt)

There is an arguments mismatch between the arguments of self_coco = CocoDataset in mmdet/datasets/cityscapes.py with the new __init__ in mmdet/datasets/custom.py. self.proposal_file in CocoDataset class in mmdet/datasets/cityscapes.py is assigned to seg_suffix in CustomDataset.__init__() in mmdet/datasets/custom.py.

I'll create a PR to fix this.

sjiang95 commented 1 year ago

This bug is fixed by PR #9330.

Test result

2022-11-16 09:39:56,842 - mmdet - INFO - workflow: [('train', 1)], max: 1000 iters                                                                                                                                                                                  [15/10370]2022-11-16 09:39:56,843 - mmdet - INFO - Checkpoints will be saved to /home/quan/Documents/my_repo/semclDetection/work_dirs/faster_rcnn_r50_fpn_1x_cityscapes by HardDiskBackend.
2022-11-16 09:41:16,037 - mmdet - INFO - Iter [100/1000]        lr: 1.988e-04, eta: 0:11:52, time: 0.792, data_time: 0.004, memory: 1039, loss_rpn_cls: 0.0519, loss_rpn_bbox: 0.0627, loss_cls: 1.1775, acc: 68.8086, loss_bbox: 0.4482, loss: 1.7402
2022-11-16 09:42:35,523 - mmdet - INFO - Iter [200/1000]        lr: 3.986e-04, eta: 0:10:34, time: 0.795, data_time: 0.004, memory: 1104, loss_rpn_cls: 0.0441, loss_rpn_bbox: 0.0640, loss_cls: 0.4772, acc: 85.6465, loss_bbox: 0.3460, loss: 0.9313
2022-11-16 09:43:55,972 - mmdet - INFO - Iter [300/1000]        lr: 5.984e-04, eta: 0:09:17, time: 0.804, data_time: 0.004, memory: 1117, loss_rpn_cls: 0.0448, loss_rpn_bbox: 0.0676, loss_cls: 0.3586, acc: 88.1172, loss_bbox: 0.2444, loss: 0.7155
2022-11-16 09:45:16,004 - mmdet - INFO - Iter [400/1000]        lr: 7.982e-04, eta: 0:07:58, time: 0.800, data_time: 0.004, memory: 1117, loss_rpn_cls: 0.0413, loss_rpn_bbox: 0.0722, loss_cls: 0.3453, acc: 88.3613, loss_bbox: 0.2199, loss: 0.6787
2022-11-16 09:46:35,151 - mmdet - INFO - Saving checkpoint at 500 iterations
2022-11-16 09:46:35,819 - mmdet - INFO - Iter [500/1000]        lr: 9.980e-04, eta: 0:06:38, time: 0.798, data_time: 0.004, memory: 1298, loss_rpn_cls: 0.0478, loss_rpn_bbox: 0.0702, loss_cls: 0.3276, acc: 88.7852, loss_bbox: 0.2130, loss: 0.6586
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 500/500, 3.0 task/s, elapsed: 168s, ETA:     0sloading annotations into memory...
Done (t=0.04s)
creating index...
index created!
loading annotations into memory...
Done (t=0.16s)
creating index...
index created!                                                                                                                                                                                                                                                                2022-11-16 09:49:24,107 - mmdet - INFO - Evaluating bbox...
Loading and preparing results...
DONE (t=0.14s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=6.84s).
Accumulating evaluation results...
DONE (t=0.62s).
2022-11-16 09:49:31,801 - mmdet - INFO -
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.139
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=1000 ] = 0.297
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=1000 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.052
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.187
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.281
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.227
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 ] = 0.227
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=1000 ] = 0.227
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.054
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.261
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.439

2022-11-16 09:49:31,831 - mmdet - INFO - Iter(val) [500]        bbox_mAP: 0.1390, bbox_mAP_50: 0.2970, bbox_mAP_75: -1.0000, bbox_mAP_s: 0.0520, bbox_mAP_m: 0.1870, bbox_mAP_l: 0.2810, bbox_mAP_copypaste: 0.139 0.297 -1.000 0.052 0.187 0.281
2022-11-16 09:50:51,997 - mmdet - INFO - Iter [600/1000]        lr: 1.000e-03, eta: 0:07:16, time: 2.562, data_time: 1.764, memory: 1298, loss_rpn_cls: 0.0417, loss_rpn_bbox: 0.0703, loss_cls: 0.3083, acc: 89.3789, loss_bbox: 0.1991, loss: 0.6194
2022-11-16 09:52:12,053 - mmdet - INFO - Iter [700/1000]        lr: 1.000e-03, eta: 0:05:15, time: 0.801, data_time: 0.004, memory: 1298, loss_rpn_cls: 0.0372, loss_rpn_bbox: 0.0690, loss_cls: 0.3128, acc: 89.4688, loss_bbox: 0.1874, loss: 0.6064
2022-11-16 09:53:30,970 - mmdet - INFO - Iter [800/1000]        lr: 1.000e-03, eta: 0:03:23, time: 0.789, data_time: 0.004, memory: 1298, loss_rpn_cls: 0.0370, loss_rpn_bbox: 0.0709, loss_cls: 0.3139, acc: 88.6914, loss_bbox: 0.1968, loss: 0.6186
2022-11-16 09:54:49,779 - mmdet - INFO - Iter [900/1000]        lr: 1.000e-03, eta: 0:01:39, time: 0.788, data_time: 0.004, memory: 1298, loss_rpn_cls: 0.0385, loss_rpn_bbox: 0.0743, loss_cls: 0.3071, acc: 89.1699, loss_bbox: 0.1938, loss: 0.6136
2022-11-16 09:56:09,959 - mmdet - INFO - Saving checkpoint at 1000 iterations
2022-11-16 09:56:10,754 - mmdet - INFO - Exp name: faster_rcnn_r50_fpn_1x_cityscapes.py
2022-11-16 09:56:10,754 - mmdet - INFO - Iter [1000/1000]       lr: 1.000e-03, eta: 0:00:00, time: 0.810, data_time: 0.004, memory: 1298, loss_rpn_cls: 0.0314, loss_rpn_bbox: 0.0671, loss_cls: 0.2992, acc: 89.1348, loss_bbox: 0.1924, loss: 0.5901
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 500/500, 3.0 task/s, elapsed: 168s, ETA:     0sloading annotations into memory...
Done (t=0.15s)
creating index...
index created!
loading annotations into memory...
Done (t=0.05s)
creating index...
index created!
2022-11-16 09:58:58,904 - mmdet - INFO - Evaluating bbox...
Loading and preparing results...
DONE (t=0.02s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=5.92s).
Accumulating evaluation results...
DONE (t=0.55s).
2022-11-16 09:59:05,573 - mmdet - INFO -
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.206
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=1000 ] = 0.398
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=1000 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.064
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.240
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.411
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.304
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=300 ] = 0.304
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=1000 ] = 0.304
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = 0.067
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.340
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.547

2022-11-16 09:59:05,606 - mmdet - INFO - Exp name: faster_rcnn_r50_fpn_1x_cityscapes.py
2022-11-16 09:59:05,606 - mmdet - INFO - Iter(val) [500]        bbox_mAP: 0.2060, bbox_mAP_50: 0.3980, bbox_mAP_75: -1.0000, bbox_mAP_s: 0.0640, bbox_mAP_m: 0.2400, bbox_mAP_l: 0.4110, bbox_mAP_copypaste: 0.206 0.398 -1.000 0.064 0.240 0.411