v2.5.0 broke OD model training on custom dataset?

mcdy143 commented 4 years ago

After updating to v2.5.0, training faster_rcnn model on customized dataset (COCO-like format) seem to have broken. The validation accuracy does stays close to 0 starting at the second epoch and stays 0. The trained model does not generate any bounding box even at confidence threshold=0.

After reverting back to v2.4.0, training worked like before using the same dataset and same config files. Validation output from one epoch: Evaluate annotation type *bbox* DONE (t=0.24s). Accumulating evaluation results... DONE (t=0.10s). Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=1000 ] = 0.002 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=1000 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = -1.000 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.002 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=300 ] = 0.002 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=1000 ] = 0.002 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = -1.000 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.002

RyanXLi commented 4 years ago

Hi! Could you follow the template of reporting an error so we can get more information?

mcdy143 commented 4 years ago

Sorry for the late reply. Please see below:

Reproduction

What command or script did you run?

nohup python -u tools/train.py configs_od_entryway/cascade_rcnn/cascade_rcnn_x101_64x4d_fpn_1x_coco.py --work-dir work_dirs/OD_entryway_cascade_x101 &

Did you make any modifications on the code or config? Did you understand what you have modified? Yes. Modified files include: 1). Added custom CoCo format dataset under mmdetection/mmdet/datasets/ 2). Added new data class in mmdetection/mmdet/datasets/init.py 3). modified config files base/datasets/coco_detection.py and base/model/cascade_rcnn_r50_fpn.py 's num_classes to match number of classes in new dataset.
What dataset did you use? modified coco dataset with 12 classes

Environment

sys.platform: linux
Python: 3.7.8 | packaged by conda-forge | (default, Jul 31 2020, 02:25:08) [GCC 7.5.0]
CUDA available: True
GPU 0: Tesla V100-SXM2-16GB
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 10.0, V10.0.130
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.5.0
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 10.2
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  - CuDNN 7.6.5
  - Magma 2.5.2
  - Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_INTERNAL_THREADPOOL_IMPL -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF, 

TorchVision: 0.6.0a0+82fd1c8
OpenCV: 4.4.0
MMCV: 1.2.0
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 10.2
MMDetection: 2.6.0+885b86b

Error traceback If applicable, paste the error trackback here.

Evaluate annotation type *bbox* DONE (t=0.24s). Accumulating evaluation results... DONE (t=0.10s). Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=1000 ] = 0.002 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=1000 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = -1.000 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.002 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=300 ] = 0.002 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=1000 ] = 0.002 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=1000 ] = -1.000 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=1000 ] = 0.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=1000 ] = 0.002

ZwwWayne commented 3 years ago

This is because that the version before MMDetection v2.6 has issues with the customized classes on COCODataset. Please try the newest MMDet and see whether you still meet the issue.

open-mmlab / mmdetection

v2.5.0 broke OD model training on custom dataset? #4003