open-mmlab / mmocr

OpenMMLab Text Detection, Recognition and Understanding Toolbox
https://mmocr.readthedocs.io/en/dev-1.x/
Apache License 2.0
4.27k stars 743 forks source link

ZeroDivisionError: float division by zero with panet_r18_fpem_ffm_600e_icdar2015 #275

Open antoniolanza1996 opened 3 years ago

antoniolanza1996 commented 3 years ago

Describe the bug If I print results variable here, I obtain:

{'img': array([], shape=(0, 63, 3), dtype=uint8), 'filename': None, 'ori_filename': None, 'img_shape': (0, 63, 3), 'ori_shape': (0, 63, 3), 'img_fields': ['img']}

Hence I guess this text detection prediction with height=0 breaks the whole pipeline. How can I filter out this erroneous prediction?

Environment

sys.platform: linux
Python: 3.7.10 (default, May  3 2021, 02:48:31) [GCC 7.5.0]
CUDA available: True
GPU 0: Tesla T4
CUDA_HOME: /usr/local/cuda
NVCC: Build cuda_11.0_bu.TC445_37.28845127_0
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.5.0+cu101
PyTorch compiling details: PyTorch built with:
  - GCC 7.3
  - C++ Version: 201402
  - Intel(R) Math Kernel Library Version 2019.0.5 Product Build 20190808 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 10.1
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  - CuDNN 7.6.3
  - Magma 2.5.2
  - Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_INTERNAL_THREADPOOL_IMPL -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF, 

TorchVision: 0.6.0+cu101
OpenCV: 4.1.2
MMCV: 1.3.5
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 10.1
MMOCR: 0.2.0+bb44475

Error traceback



/content/gdrive/MyDrive/OCR_v2/mmocr/mmocr/apis/inference.py in model_inference(model, imgs, batch_mode)
     82 
     83         # build the data pipeline
---> 84         data = test_pipeline(data)
     85         # get tensor from list to stack for batch mode (text detection)
     86         if batch_mode:

/usr/local/lib/python3.7/dist-packages/mmdet/datasets/pipelines/compose.py in __call__(self, data)
     38 
     39         for t in self.transforms:
---> 40             data = t(data)
     41             if data is None:
     42                 return None

/content/gdrive/MyDrive/OCR_v2/mmocr/mmocr/datasets/pipelines/ocr_transforms.py in __call__(self, results)
     86 
     87         if self.keep_aspect_ratio:
---> 88             new_width = math.ceil(float(dst_height) / ori_height * ori_width)
     89             width_divisor = int(1 / self.width_downsample_ratio)
     90             # make sure new_width is an integral multiple of width_divisor.

ZeroDivisionError: float division by zero```
cuhk-hbsun commented 3 years ago

@antoniolanza1996 here you mentioned above is for text recognition, but the config in the title of issue is from text detection. Why does panet_r18_fpem_ffm_600e_icdar2015.py use ResizeOCR? Any thing wrong?

antoniolanza1996 commented 3 years ago

Hi @cuhk-hbsun , actually I'm working with demo/ocr_image_demo.py script using panet_r18_fpem_ffm_600e_icdar2015 as text detector and robustscanner_r31_academic as text recognizer and I get the error mentioned above.

If I only switch the text detector (e.g. using psenet_r50_fpnf_600e_icdar2015), this problem doesn't come up. Hence I though that panet_r18_fpem_ffm_600e_icdar2015 predicts some erroneous bounding boxes (e.g. with width=0). Indeed, if I add this simple check after this line of code:

height,width = box_img.shape[:2]
if height==0 or width==0:
  continue

I skip these erroneous predictions and everything is alright in text recognition step.

I see two possible solutions here:

  1. simply add this check on demo/ocr_image_demo.py script
  2. make sure that detection model doesn't produce these predictions before to return bounding boxes in model_inference API here (more robust and generalizable, i guess).