open-mmlab / mmocr

OpenMMLab Text Detection, Recognition and Understanding Toolbox
https://mmocr.readthedocs.io/en/dev-1.x/
Apache License 2.0
4.27k stars 743 forks source link

IndexError: Caught IndexError in DataLoader worker process 0 IndexError: index 2 is out of bounds for axis 0 with size 2 #614

Closed thongvhoang closed 2 years ago

thongvhoang commented 2 years ago

Checklist

  1. I have searched related issues but cannot get the expected help.
  2. The bug has not been fixed in the latest version.

Describe the bug I trained the fcenet_r50dcnv2_fpn_1500e_ctw1500 and fcenet_r50_fpn_1500e_icdar2015 on ICDAR2015 (my custom dataset). I use the DB net for training which is successful but trained with FCENET is failed. My config log file when training model is attached here 20211123_174847.log.zip

2021-11-23 17:48:52,519 - mmocr - INFO - workflow: [('train', 1)], max: 100 epochs
2021-11-23 17:48:52,519 - mmocr - INFO - Checkpoints will be saved to /content/drive/My Drive/Colab_Notebook/text_scence_detection/mmocr/fcenet by HardDiskBackend.
2021-11-23 17:49:28,379 - mmocr - INFO - Epoch [1][10/276]  lr: 1.000e-04, eta: 1 day, 3:21:08, time: 3.569, data_time: 2.821, memory: 7153, loss_text: 2.2355, loss_center: 2.0060, loss_reg_x: 6.6989, loss_reg_y: 4.4823, loss: 15.4226
2021-11-23 17:50:03,808 - mmocr - INFO - Epoch [1][20/276]  lr: 1.000e-04, eta: 1 day, 3:14:33, time: 3.543, data_time: 2.782, memory: 7153, loss_text: 1.7876, loss_center: 2.0288, loss_reg_x: 6.2060, loss_reg_y: 4.0820, loss: 14.1043
2021-11-23 17:50:31,123 - mmocr - INFO - Epoch [1][30/276]  lr: 1.000e-04, eta: 1 day, 1:07:38, time: 2.731, data_time: 1.962, memory: 7153, loss_text: 1.5562, loss_center: 1.8024, loss_reg_x: 5.6532, loss_reg_y: 2.9423, loss: 11.9542
2021-11-23 17:50:55,417 - mmocr - INFO - Epoch [1][40/276]  lr: 1.000e-04, eta: 23:29:19, time: 2.430, data_time: 1.668, memory: 7153, loss_text: 1.4154, loss_center: 1.6844, loss_reg_x: 5.0021, loss_reg_y: 2.6096, loss: 10.7115
2021-11-23 17:51:17,041 - mmocr - INFO - Epoch [1][50/276]  lr: 1.000e-04, eta: 22:05:37, time: 2.162, data_time: 1.400, memory: 7153, loss_text: 1.3562, loss_center: 1.6468, loss_reg_x: 5.0777, loss_reg_y: 3.0315, loss: 11.1122
2021-11-23 17:51:41,480 - mmocr - INFO - Epoch [1][60/276]  lr: 1.000e-04, eta: 21:31:15, time: 2.444, data_time: 1.681, memory: 7153, loss_text: 1.0985, loss_center: 1.3259, loss_reg_x: 4.2499, loss_reg_y: 2.1822, loss: 8.8565
2021-11-23 17:52:07,019 - mmocr - INFO - Epoch [1][70/276]  lr: 1.000e-04, eta: 21:13:46, time: 2.554, data_time: 1.783, memory: 7153, loss_text: 1.2672, loss_center: 1.4510, loss_reg_x: 4.8901, loss_reg_y: 2.8632, loss: 10.4715
2021-11-23 17:52:34,146 - mmocr - INFO - Epoch [1][80/276]  lr: 1.000e-04, eta: 21:09:40, time: 2.713, data_time: 1.955, memory: 7153, loss_text: 1.1799, loss_center: 1.3349, loss_reg_x: 4.8116, loss_reg_y: 2.4007, loss: 9.7271
2021-11-23 17:53:02,262 - mmocr - INFO - Epoch [1][90/276]  lr: 1.000e-04, eta: 21:11:25, time: 2.812, data_time: 2.046, memory: 7153, loss_text: 1.1198, loss_center: 1.3739, loss_reg_x: 4.6463, loss_reg_y: 2.5160, loss: 9.6561
2021-11-23 17:53:26,086 - mmocr - INFO - Epoch [1][100/276] lr: 1.000e-04, eta: 20:53:03, time: 2.382, data_time: 1.618, memory: 7153, loss_text: 1.2300, loss_center: 1.4093, loss_reg_x: 5.2350, loss_reg_y: 3.0039, loss: 10.8782
2021-11-23 17:53:49,918 - mmocr - INFO - Epoch [1][110/276] lr: 1.000e-04, eta: 20:38:00, time: 2.383, data_time: 1.614, memory: 7153, loss_text: 1.2258, loss_center: 1.4517, loss_reg_x: 5.3001, loss_reg_y: 2.4525, loss: 10.4300
2021-11-23 17:54:14,551 - mmocr - INFO - Epoch [1][120/276] lr: 1.000e-04, eta: 20:28:26, time: 2.463, data_time: 1.691, memory: 7153, loss_text: 1.1217, loss_center: 1.3953, loss_reg_x: 4.7827, loss_reg_y: 2.1281, loss: 9.4278
2021-11-23 17:54:44,655 - mmocr - INFO - Epoch [1][130/276] lr: 1.000e-04, eta: 20:39:32, time: 3.010, data_time: 2.246, memory: 7153, loss_text: 1.0440, loss_center: 1.2097, loss_reg_x: 4.6751, loss_reg_y: 1.9459, loss: 8.8746
2021-11-23 17:55:19,917 - mmocr - INFO - Epoch [1][140/276] lr: 1.000e-04, eta: 21:05:51, time: 3.526, data_time: 2.762, memory: 7153, loss_text: 1.1682, loss_center: 1.3761, loss_reg_x: 4.5412, loss_reg_y: 2.6301, loss: 9.7156
2021-11-23 17:55:46,037 - mmocr - INFO - Epoch [1][150/276] lr: 1.000e-04, eta: 21:00:42, time: 2.612, data_time: 1.846, memory: 7153, loss_text: 1.3153, loss_center: 1.5453, loss_reg_x: 5.7855, loss_reg_y: 2.3081, loss: 10.9542
2021-11-23 17:56:11,265 - mmocr - INFO - Epoch [1][160/276] lr: 1.000e-04, eta: 20:53:35, time: 2.523, data_time: 1.756, memory: 7153, loss_text: 1.1700, loss_center: 1.2924, loss_reg_x: 4.7299, loss_reg_y: 2.3905, loss: 9.5828
2021-11-23 17:57:02,091 - mmocr - INFO - Epoch [1][170/276] lr: 1.000e-04, eta: 21:56:06, time: 5.083, data_time: 4.320, memory: 7153, loss_text: 1.0417, loss_center: 1.1694, loss_reg_x: 4.3365, loss_reg_y: 1.7992, loss: 8.3468
2021-11-23 17:57:23,990 - mmocr - INFO - Epoch [1][180/276] lr: 1.000e-04, eta: 21:38:07, time: 2.190, data_time: 1.430, memory: 7153, loss_text: 0.8899, loss_center: 1.1139, loss_reg_x: 3.8824, loss_reg_y: 1.9227, loss: 7.8090
2021-11-23 17:57:50,112 - mmocr - INFO - Epoch [1][190/276] lr: 1.000e-04, eta: 21:32:10, time: 2.612, data_time: 1.850, memory: 7153, loss_text: 1.2094, loss_center: 1.3650, loss_reg_x: 4.6892, loss_reg_y: 2.1981, loss: 9.4617
2021-11-23 17:58:14,250 - mmocr - INFO - Epoch [1][200/276] lr: 1.000e-04, eta: 21:22:13, time: 2.414, data_time: 1.645, memory: 7153, loss_text: 0.9599, loss_center: 1.1418, loss_reg_x: 4.1219, loss_reg_y: 1.8467, loss: 8.0703
2021-11-23 17:58:45,136 - mmocr - INFO - Epoch [1][210/276] lr: 1.000e-04, eta: 21:27:51, time: 3.089, data_time: 2.319, memory: 7153, loss_text: 0.9811, loss_center: 1.1552, loss_reg_x: 3.9769, loss_reg_y: 1.9601, loss: 8.0733
2021-11-23 17:59:13,359 - mmocr - INFO - Epoch [1][220/276] lr: 1.000e-04, eta: 21:27:24, time: 2.822, data_time: 2.062, memory: 7153, loss_text: 1.1233, loss_center: 1.2776, loss_reg_x: 4.6797, loss_reg_y: 1.7720, loss: 8.8526
2021-11-23 17:59:38,868 - mmocr - INFO - Epoch [1][230/276] lr: 1.000e-04, eta: 21:21:34, time: 2.551, data_time: 1.788, memory: 7153, loss_text: 1.1358, loss_center: 1.1804, loss_reg_x: 4.3514, loss_reg_y: 1.8128, loss: 8.4804
2021-11-23 18:00:10,258 - mmocr - INFO - Epoch [1][240/276] lr: 1.000e-04, eta: 21:27:22, time: 3.139, data_time: 2.374, memory: 7153, loss_text: 1.2964, loss_center: 1.3798, loss_reg_x: 4.5636, loss_reg_y: 2.1423, loss: 9.3821
2021-11-23 18:00:34,916 - mmocr - INFO - Epoch [1][250/276] lr: 1.000e-04, eta: 21:20:23, time: 2.466, data_time: 1.700, memory: 7153, loss_text: 1.0350, loss_center: 1.2190, loss_reg_x: 3.8675, loss_reg_y: 1.8710, loss: 7.9926
2021-11-23 18:00:57,309 - mmocr - INFO - Epoch [1][260/276] lr: 1.000e-04, eta: 21:09:56, time: 2.239, data_time: 1.469, memory: 7153, loss_text: 0.9346, loss_center: 1.0585, loss_reg_x: 3.0851, loss_reg_y: 1.6522, loss: 6.7304

Traceback (most recent call last):
  File "tools/train.py", line 221, in <module>
    main()
  File "tools/train.py", line 217, in main
    meta=meta)
  File "/content/drive/My Drive/Colab_Notebook/text_scence_detection/mmocr/mmocr/apis/train.py", line 165, in train_detector
    runner.run(data_loaders, cfg.workflow)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/runner/epoch_based_runner.py", line 47, in train
    for i, data_batch in enumerate(self.data_loader):
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 435, in __next__
    data = self._next_data()
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1085, in _next_data
    return self._process_data(data)
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1111, in _process_data
    data.reraise()
  File "/usr/local/lib/python3.7/dist-packages/torch/_utils.py", line 428, in reraise
    raise self.exc_type(msg)
IndexError: Caught IndexError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop
    data = fetcher.fetch(index)
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.7/dist-packages/mmdet/datasets/custom.py", line 195, in __getitem__
    data = self.prepare_train_img(idx)
  File "/usr/local/lib/python3.7/dist-packages/mmdet/datasets/custom.py", line 218, in prepare_train_img
    return self.pipeline(results)
  File "/usr/local/lib/python3.7/dist-packages/mmdet/datasets/pipelines/compose.py", line 41, in __call__
    data = t(data)
  File "/content/drive/My Drive/Colab_Notebook/text_scence_detection/mmocr/mmocr/datasets/pipelines/textdet_targets/base_textdet_targets.py", line 167, in __call__
    results = self.generate_targets(results)
  File "/content/drive/My Drive/Colab_Notebook/text_scence_detection/mmocr/mmocr/datasets/pipelines/textdet_targets/fcenet_targets.py", line 350, in generate_targets
    polygon_masks_ignore)
  File "/content/drive/My Drive/Colab_Notebook/text_scence_detection/mmocr/mmocr/datasets/pipelines/textdet_targets/fcenet_targets.py", line 316, in generate_level_targets
    level_img_size, lv_text_polys[ind])[None]
  File "/content/drive/My Drive/Colab_Notebook/text_scence_detection/mmocr/mmocr/datasets/pipelines/textdet_targets/fcenet_targets.py", line 71, in generate_center_region_mask
    top_line, bot_line, self.resample_step)
  File "/content/drive/My Drive/Colab_Notebook/text_scence_detection/mmocr/mmocr/datasets/pipelines/textdet_targets/textsnake_targets.py", line 281, in resample_sidelines
    resampled_line1 = self.resample_line(sideline1, resample_point_num)
  File "/content/drive/My Drive/Colab_Notebook/text_scence_detection/mmocr/mmocr/datasets/pipelines/textdet_targets/textsnake_targets.py", line 231, in resample_line
    while current_line_len >= length_cumsum[current_edge_ind + 1]:
IndexError: index 2 is out of bounds for axis 0 with size 2

Reproduction

  1. What command or script did you run?
    • I have used google colab to train model.
!python tools/train.py configs/textdet/fcenet/fcenet_r50dcnv2_fpn_1500e_ctw1500.py \
        --work-dir fcenet 

or command:

!python tools/train.py configs/textdet/fcenet/fcenet_r50_fpn_1500e_icdar2015.py \
        --work-dir fcenet 
  1. Did you make any modifications on the code or config? Did you understand what you have modified?

    • Yes I did. I understood. I have just modified workers_per_gpu=0 and learning rate.
  2. What dataset did you use? I use format ICDAR dataset specifically ICDAR2015. Environment

  3. Please run python mmocr/utils/collect_env.py to collect necessary environment information and paste it here.

    
    sys.platform: linux
    Python: 3.7.12 (default, Sep 10 2021, 00:21:48) [GCC 7.5.0]
    CUDA available: True
    GPU 0: Tesla P100-PCIE-16GB
    CUDA_HOME: /usr/local/cuda
    NVCC: Build cuda_11.1.TC455_06.29190527_0
    GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
    PyTorch: 1.7.0+cu110
    PyTorch compiling details: PyTorch built with:
    - GCC 7.3
    - C++ Version: 201402
    - Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
    - Intel(R) MKL-DNN v1.6.0 (Git Hash 5ef631a030a6f73131c77892041042805a06064f)
    - OpenMP 201511 (a.k.a. OpenMP 4.5)
    - NNPACK is enabled
    - CPU capability usage: AVX2
    - CUDA Runtime 11.0
    - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80
    - CuDNN 8.0.4
    - Magma 2.5.2
    - Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, 

TorchVision: 0.8.0 OpenCV: 4.1.2 MMCV: 1.3.17 MMCV Compiler: GCC 7.5 MMCV CUDA Compiler: 11.1 MMOCR: 0.3.0+53b050b

2. You may add addition that may be helpful for locating the problem, such as
    - How you installed PyTorch [e.g., pip, conda, source]
            - I followed the MMOCR's tutorial (section **Install Dependencies**): https://github.com/open-mmlab/mmocr/blob/main/demo/MMOCR_Tutorial.ipynb
            Command: 

!pip install -U torch==1.7.0+cu110 torchvision==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html

Install mmcv-full thus we could use CUDA operators

!pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu110/torch1.6.0/index.html

Install mmdetection

!pip install mmdet %cd mmocr !pip install -r requirements.txt !pip install -v -e .

    - Other environment variables that may be related (such as `$PATH`, `$LD_LIBRARY_PATH`, `$PYTHONPATH`, etc.)

**Error traceback**
If applicable, paste the error traceback here.

```none
Traceback (most recent call last):
  File "tools/train.py", line 221, in <module>
    main()
  File "tools/train.py", line 217, in main
    meta=meta)
  File "/content/drive/My Drive/Colab_Notebook/text_scence_detection/mmocr/mmocr/apis/train.py", line 165, in train_detector
    runner.run(data_loaders, cfg.workflow)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/mmcv/runner/epoch_based_runner.py", line 47, in train
    for i, data_batch in enumerate(self.data_loader):
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 435, in __next__
    data = self._next_data()
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1085, in _next_data
    return self._process_data(data)
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1111, in _process_data
    data.reraise()
  File "/usr/local/lib/python3.7/dist-packages/torch/_utils.py", line 428, in reraise
    raise self.exc_type(msg)
IndexError: Caught IndexError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/worker.py", line 198, in _worker_loop
    data = fetcher.fetch(index)
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.7/dist-packages/mmdet/datasets/custom.py", line 195, in __getitem__
    data = self.prepare_train_img(idx)
  File "/usr/local/lib/python3.7/dist-packages/mmdet/datasets/custom.py", line 218, in prepare_train_img
    return self.pipeline(results)
  File "/usr/local/lib/python3.7/dist-packages/mmdet/datasets/pipelines/compose.py", line 41, in __call__
    data = t(data)
  File "/content/drive/My Drive/Colab_Notebook/text_scence_detection/mmocr/mmocr/datasets/pipelines/textdet_targets/base_textdet_targets.py", line 167, in __call__
    results = self.generate_targets(results)
  File "/content/drive/My Drive/Colab_Notebook/text_scence_detection/mmocr/mmocr/datasets/pipelines/textdet_targets/fcenet_targets.py", line 350, in generate_targets
    polygon_masks_ignore)
  File "/content/drive/My Drive/Colab_Notebook/text_scence_detection/mmocr/mmocr/datasets/pipelines/textdet_targets/fcenet_targets.py", line 316, in generate_level_targets
    level_img_size, lv_text_polys[ind])[None]
  File "/content/drive/My Drive/Colab_Notebook/text_scence_detection/mmocr/mmocr/datasets/pipelines/textdet_targets/fcenet_targets.py", line 71, in generate_center_region_mask
    top_line, bot_line, self.resample_step)
  File "/content/drive/My Drive/Colab_Notebook/text_scence_detection/mmocr/mmocr/datasets/pipelines/textdet_targets/textsnake_targets.py", line 281, in resample_sidelines
    resampled_line1 = self.resample_line(sideline1, resample_point_num)
  File "/content/drive/My Drive/Colab_Notebook/text_scence_detection/mmocr/mmocr/datasets/pipelines/textdet_targets/textsnake_targets.py", line 231, in resample_line
    while current_line_len >= length_cumsum[current_edge_ind + 1]:
IndexError: index 2 is out of bounds for axis 0 with size 2

Bug fix If you have already identified the reason, you can provide the information here. If you are willing to create a PR to fix it, please also leave a comment here and that would be much appreciated! I train DBnet (text detection) on my custom dataset is successful. But I used fcenet_r50dcnv2_fpn_1500e_ctw1500 then generated this bug above. Please help me fix that bug. Thank you.

gaotongxiao commented 2 years ago

ping @HolyCrap96