train sdmgr loss Nan - Githubissues

mrlihellohorld commented 2 years ago

I have some error when training sdmgr model: ssh://root@172.16.135.60:2244/opt/conda/bin/python -u /mmocr/tools/train.py configs/kie/sdmgr/sdmgr_unet16_60e_subtitile_classify.py 2021-11-16 04:41:01,896 - mmocr - INFO - Environment info: sys.platform: linux Python: 3.7.7 (default, Mar 23 2020, 22:36:06) [GCC 7.3.0] CUDA available: True GPU 0: GeForce RTX 2070 CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 10.1, V10.1.243 GCC: gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 PyTorch: 1.5.0 PyTorch compiling details: PyTorch built with:

GCC 7.3 C++ Version: 201402 Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc) OpenMP 201511 (a.k.a. OpenMP 4.5) NNPACK is enabled CPU capability usage: AVX2 CUDA Runtime 10.1 NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37 CuDNN 7.6.3 Magma 2.5.2 Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_INTERNAL_THREADPOOL_IMPL -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF, TorchVision: 0.6.0a0+82fd1c8 OpenCV: 4.5.2 MMCV: 1.3.4 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 10.1 MMOCR: 0.2.0+ae90dea 2021-11-16 04:41:05,357 - mmocr - INFO - Distributed training: False 2021-11-16 04:41:08,726 - mmocr - INFO - Config: img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) max_scale = 1024 min_scale = 512 train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations'), dict(type='Resize', img_scale=(1024, 512), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.0), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='KIEFormatBundle'), dict( type='Collect', keys=['img', 'relations', 'texts', 'gt_bboxes', 'gt_labels']) ] test_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations'), dict(type='Resize', img_scale=(1024, 512), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.0), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='KIEFormatBundle'), dict(type='Collect', keys=['img', 'relations', 'texts', 'gt_bboxes']) ] dataset_type = 'KIEDataset' data_root = '/data/labels_convert/KIE_subtitle_classify/video_frame_dataset_train' loader = dict( type='HardDiskLoader', repeat=1, parser=dict( type='LineJsonParser', keys=['file_name', 'height', 'width', 'annotations'])) train = dict( type='KIEDataset', ann_file= '/data/labels_convert/KIE_subtitle_classify/video_frame_dataset_train/train.txt', pipeline=[ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations'), dict(type='Resize', img_scale=(1024, 512), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.0), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='KIEFormatBundle'), dict( type='Collect', keys=['img', 'relations', 'texts', 'gt_bboxes', 'gt_labels']) ], img_prefix= '/data/labels_convert/KIE_subtitle_classify/video_frame_dataset_train', loader=dict( type='HardDiskLoader', repeat=1, parser=dict( type='LineJsonParser', keys=['file_name', 'height', 'width', 'annotations'])), dict_file= '/data/labels_convert/KIE_subtitle_classify/video_frame_dataset_train/dict.txt', test_mode=False) test = dict( type='KIEDataset', ann_file= '/data/labels_convert/KIE_subtitle_classify/video_frame_dataset_train/test.txt', pipeline=[ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations'), dict(type='Resize', img_scale=(1024, 512), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.0), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='KIEFormatBundle'), dict(type='Collect', keys=['img', 'relations', 'texts', 'gt_bboxes']) ], img_prefix= '/data/labels_convert/KIE_subtitle_classify/video_frame_dataset_train', loader=dict( type='HardDiskLoader', repeat=1, parser=dict( type='LineJsonParser', keys=['file_name', 'height', 'width', 'annotations'])), dict_file= '/data/labels_convert/KIE_subtitle_classify/video_frame_dataset_train/dict.txt', test_mode=True) data = dict( samples_per_gpu=1, workers_per_gpu=1, train=dict( type='KIEDataset', ann_file= '/data/labels_convert/KIE_subtitle_classify/video_frame_dataset_train/train.txt', pipeline=[ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations'), dict(type='Resize', img_scale=(1024, 512), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.0), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='KIEFormatBundle'), dict( type='Collect', keys=['img', 'relations', 'texts', 'gt_bboxes', 'gt_labels']) ], img_prefix= '/data/labels_convert/KIE_subtitle_classify/video_frame_dataset_train', loader=dict( type='HardDiskLoader', repeat=1, parser=dict( type='LineJsonParser', keys=['file_name', 'height', 'width', 'annotations'])), dict_file= '/data/labels_convert/KIE_subtitle_classify/video_frame_dataset_train/dict.txt', test_mode=False), val=dict( type='KIEDataset', ann_file= '/data/labels_convert/KIE_subtitle_classify/video_frame_dataset_train/test.txt', pipeline=[ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations'), dict(type='Resize', img_scale=(1024, 512), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.0), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='KIEFormatBundle'), dict( type='Collect', keys=['img', 'relations', 'texts', 'gt_bboxes']) ], img_prefix= '/data/labels_convert/KIE_subtitle_classify/video_frame_dataset_train', loader=dict( type='HardDiskLoader', repeat=1, parser=dict( type='LineJsonParser', keys=['file_name', 'height', 'width', 'annotations'])), dict_file= '/data/labels_convert/KIE_subtitle_classify/video_frame_dataset_train/dict.txt', test_mode=True), test=dict( type='KIEDataset', ann_file= '/data/labels_convert/KIE_subtitle_classify/video_frame_dataset_train/test.txt', pipeline=[ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations'), dict(type='Resize', img_scale=(1024, 512), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.0), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='KIEFormatBundle'), dict( type='Collect', keys=['img', 'relations', 'texts', 'gt_bboxes']) ], img_prefix= '/data/labels_convert/KIE_subtitle_classify/video_frame_dataset_train', loader=dict( type='HardDiskLoader', repeat=1, parser=dict( type='LineJsonParser', keys=['file_name', 'height', 'width', 'annotations'])), dict_file= '/data/labels_convert/KIE_subtitle_classify/video_frame_dataset_train/dict.txt', test_mode=True)) evaluation = dict( interval=1, metric='macro_f1', metric_options=dict( macro_f1=dict( ignores=[0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 25]))) model = dict( type='SDMGR', backbone=dict(type='UNet', base_channels=16), bbox_head=dict( type='SDMGRHead', visual_dim=16, num_chars=92, num_classes=26), visual_modality=True, train_cfg=None, test_cfg=None, class_list= '/data/labels_convert/KIE_subtitle_classify/video_frame_dataset_train/class_list.txt' ) optimizer = dict(type='Adam', weight_decay=0.0001) optimizer_config = dict(grad_clip=None) lr_config = dict( policy='step', warmup='linear', warmup_iters=1, warmup_ratio=1, step=[40, 50]) total_epochs = 60 checkpoint_config = dict(interval=1) log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')]) dist_params = dict(backend='nccl') log_level = 'INFO' load_from = None resume_from = None workflow = [('train', 1)] find_unused_parameters = True work_dir = './work_dirs/sdmgr_unet16_60e_subtitile_classify' gpu_ids = range(0, 1)

/mmocr/mmocr/apis/train.py:79: UserWarning: config is now expected to have a runner section, please set runner in your config. 'please set runner in your config.', UserWarning) 2021-11-16 04:41:11,105 - mmocr - INFO - Start running, host: root@mrli-HP-Z440, work_dir: /mmocr/work_dirs/sdmgr_unet16_60e_subtitile_classify 2021-11-16 04:41:11,105 - mmocr - INFO - workflow: [('train', 1)], max: 60 epochs 2021-11-16 04:41:19,250 - mmocr - INFO - Epoch [1][50/59214] lr: 1.000e-03, eta: 6 days, 16:43:32, time: 0.163, data_time: 0.048, memory: 962, loss_node: nan, loss_edge: nan, acc_node: 11.0414, acc_edge: 4.0000, loss: nan 2021-11-16 04:41:25,053 - mmocr - INFO - Epoch [1][100/59214] lr: 1.000e-03, eta: 5 days, 17:37:30, time: 0.116, data_time: 0.003, memory: 971, loss_node: nan, loss_edge: nan, acc_node: 10.4146, acc_edge: 0.0000, loss: nan 2021-11-16 04:41:30,881 - mmocr - INFO - Epoch [1][150/59214] lr: 1.000e-03, eta: 5 days, 10:05:07, time: 0.117, data_time: 0.003, memory: 971, loss_node: nan, loss_edge: nan, acc_node: 11.1532, acc_edge: 0.0000, loss: nan 2021-11-16 04:41:36,720 - mmocr - INFO - Epoch [1][200/59214] lr: 1.000e-03, eta: 5 days, 6:22:24, time: 0.117, data_time: 0.003, memory: 971, loss_node: nan, loss_edge: nan, acc_node: 10.5853, acc_edge: 0.0000, loss: nan 2021-11-16 04:41:42,385 - mmocr - INFO - Epoch [1][250/59214] lr: 1.000e-03, eta: 5 days, 3:27:33, time: 0.113, data_time: 0.003, memory: 971, loss_node: nan, loss_edge: nan, acc_node: 7.8421, acc_edge: 0.0000, loss: nan 2021-11-16 04:41:48,179 - mmocr - INFO - Epoch [1][300/59214] lr: 1.000e-03, eta: 5 days, 1:56:09, time: 0.116, data_time: 0.003, memory: 1132, loss_node: nan, loss_edge: nan, acc_node: 8.9169, acc_edge: 0.0000, loss: nan 2021-11-16 04:41:53,983 - mmocr - INFO - Epoch [1][350/59214] lr: 1.000e-03, eta: 5 days, 0:52:50, time: 0.116, data_time: 0.003, memory: 1132, loss_node: nan, loss_edge: nan, acc_node: 7.6211, acc_edge: 0.0000, loss: nan 2021-11-16 04:41:59,853 - mmocr - INFO - Epoch [1][400/59214] lr: 1.000e-03, eta: 5 days, 0:14:48, time: 0.117, data_time: 0.003, memory: 1132, loss_node: nan, loss_edge: nan, acc_node: 8.9602, acc_edge: 0.0000, loss: nan 2021-11-16 04:42:05,717 - mmocr - INFO - Epoch [1][450/59214] lr: 1.000e-03, eta: 4 days, 23:44:38, time: 0.117, data_time: 0.003, memory: 1132, loss_node: nan, loss_edge: nan, acc_node: 10.1780, acc_edge: 0.0000, loss: nan 2021-11-16 04:42:11,605 - mmocr - INFO - Epoch [1][500/59214] lr: 1.000e-03, eta: 4 days, 23:23:11, time: 0.118, data_time: 0.003, memory: 1132, loss_node: nan, loss_edge: nan, acc_node: 11.6985, acc_edge: 0.0000, loss: nan 2021-11-16 04:42:17,470 - mmocr - INFO - Epoch [1][550/59214] lr: 1.000e-03, eta: 4 days, 23:03:16, time: 0.117, data_time: 0.003, memory: 1132, loss_node: nan, loss_edge: nan, acc_node: 9.7816, acc_edge: 0.0000, loss: nan 2021-11-16 04:42:23,296 - mmocr - INFO - Epoch [1][600/59214] lr: 1.000e-03, eta: 4 days, 22:42:42, time: 0.117, data_time: 0.003, memory: 1132, loss_node: nan, loss_edge: nan, acc_node: 11.2980, acc_edge: 0.0000, loss: nan 2021-11-16 04:42:29,190 - mmocr - INFO - Epoch [1][650/59214] lr: 1.000e-03, eta: 4 days, 22:31:31, time: 0.118, data_time: 0.003, memory: 1132, loss_node: nan, loss_edge: nan, acc_node: 12.2074, acc_edge: 0.0000, loss: nan 2021-11-16 04:42:34,988 - mmocr - INFO - Epoch [1][700/59214] lr: 1.000e-03, eta: 4 days, 22:13:48, time: 0.116, data_time: 0.003, memory: 1132, loss_node: nan, loss_edge: nan, acc_node: 10.6120, acc_edge: 0.0000, loss: nan 2021-11-16 04:42:40,757 - mmocr - INFO - Epoch [1][750/59214] lr: 1.000e-03, eta: 4 days, 21:56:10, time: 0.115, data_time: 0.003, memory: 1132, loss_node: nan, loss_edge: nan, acc_node: 7.0793, acc_edge: 0.0000, loss: nan 2021-11-16 04:42:46,679 - mmocr - INFO - Epoch [1][800/59214] lr: 1.000e-03, eta: 4 days, 21:52:01, time: 0.118, data_time: 0.003, memory: 1132, loss_node: nan, loss_edge: nan, acc_node: 10.7206, acc_edge: 0.0000, loss: nan 2021-11-16 04:42:52,534 - mmocr - INFO - Epoch [1][850/59214] lr: 1.000e-03, eta: 4 days, 21:43:40, time: 0.117, data_time: 0.003, memory: 1132, loss_node: nan, loss_edge: nan, acc_node: 13.3489, acc_edge: 0.0000, loss: nan 2021-11-16 04:42:58,445 - mmocr - INFO - Epoch [1][900/59214] lr: 1.000e-03, eta: 4 days, 21:39:52, time: 0.118, data_time: 0.003, memory: 1132, loss_node: nan, loss_edge: nan, acc_node: 10.2977, acc_edge: 0.0000, loss: nan 2021-11-16 04:43:04,358 - mmocr - INFO - Epoch [1][950/59214] lr: 1.000e-03, eta: 4 days, 21:36:40, time: 0.118, data_time: 0.003, memory: 1132, loss_node: nan, loss_edge: nan, acc_node: 8.5367, acc_edge: 0.0000, loss: nan 2021-11-16 04:43:10,178 - mmocr - INFO - Exp name: sdmgr_unet16_60e_subtitile_classify.py Traceback (most recent call last): File "/mmocr/tools/train.py", line 212, in main() File "/mmocr/tools/train.py", line 208, in main meta=meta) File "/mmocr/mmocr/apis/train.py", line 161, in train_detector runner.run(data_loaders, cfg.workflow) File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 125, in run epoch_runner(data_loaders[i], *kwargs) File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 51, in train self.call_hook('after_train_iter') File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/base_runner.py", line 307, in call_hook getattr(hook, fn_name)(self) File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/hooks/optimizer.py", line 42, in after_train_iter runner.optimizer.step() File "/opt/conda/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context return func(args, **kwargs) File "/opt/conda/lib/python3.7/site-packages/torch/optim/adam.py", line 96, in step grad = grad.add(p, alpha=group['weight_decay']) RuntimeError: CUDA error: an illegal memory access was encountered

Process finished with exit code 1

the train data sample: {"file_name": "0001.江苏网络电视台-晚间新闻 20211025_500.jpg", "height": 720, "width": 1280, "annotations": [{"box": [116.0, 601.0, 116.0, 644.0, 192.0, 644.0, 192.0, 601.0], "label": 6, "text": "新闻"}, {"box": [117.0, 566.0, 117.0, 602.0, 190.0, 602.0, 190.0, 566.0], "label": 6, "text": "晚间"}, {"box": [911.0, 456.0, 911.0, 487.0, 1004.0, 487.0, 1004.0, 456.0], "label": 1, "text": "王柏文"}, {"box": [1049.0, 456.0, 1051.0, 489.0, 1123.0, 486.0, 1122.0, 454.0], "label": 1, "text": "主播"}, {"box": [1035.0, 72.0, 1035.0, 95.0, 1197.0, 97.0, 1197.0, 74.0], "label": 6, "text": "JsTVcomp"}, {"box": [197.0, 62.0, 197.0, 92.0, 311.0, 90.0, 311.0, 59.0], "label": 6, "text": "江苏卫视"}, {"box": [1036.0, 31.0, 1036.0, 68.0, 1183.0, 68.0, 1183.0, 31.0], "label": 6, "text": "凉枝叹"}]}

The first question: why loss is Nan. and Why does cuda error occur？ Thanks

gaotongxiao commented 2 years ago

You have used Chinese in data but num_chars in SDMGRHead was still 92. You need to change num_chars to be the actual number of characters in your dict.

mrlihellohorld commented 2 years ago

Thank you for your help, i have modified the num_chars to 5111(the actual number of characters in my dict), num_classes to 7(Number of categories) but there are some new problem when evaling: 021-11-16 11:44:41,993 - mmocr - INFO - workflow: [('train', 1)], max: 2 epochs 2021-11-16 11:44:49,954 - mmocr - INFO - Epoch [1][50/361] lr: 1.000e-03, eta: 0:01:46, time: 0.159, data_time: 0.045, memory: 997, loss_node: 0.8057, loss_edge: 0.0423, acc_node: 76.2714, acc_edge: 97.9286, loss: 0.8480 2021-11-16 11:44:55,909 - mmocr - INFO - Epoch [1][100/361] lr: 1.000e-03, eta: 0:01:26, time: 0.119, data_time: 0.003, memory: 997, loss_node: 0.5725, loss_edge: 0.0009, acc_node: 80.6275, acc_edge: 100.0000, loss: 0.5734 2021-11-16 11:45:01,839 - mmocr - INFO - Epoch [1][150/361] lr: 1.000e-03, eta: 0:01:15, time: 0.119, data_time: 0.003, memory: 997, loss_node: 0.4464, loss_edge: 0.0019, acc_node: 84.7223, acc_edge: 100.0000, loss: 0.4482 2021-11-16 11:45:07,592 - mmocr - INFO - Epoch [1][200/361] lr: 1.000e-03, eta: 0:01:06, time: 0.115, data_time: 0.003, memory: 997, loss_node: 0.4829, loss_edge: 0.0008, acc_node: 85.5441, acc_edge: 100.0000, loss: 0.4837 2021-11-16 11:45:13,536 - mmocr - INFO - Epoch [1][250/361] lr: 1.000e-03, eta: 0:00:59, time: 0.119, data_time: 0.004, memory: 997, loss_node: 0.4424, loss_edge: 0.0012, acc_node: 87.4994, acc_edge: 100.0000, loss: 0.4436 2021-11-16 11:45:19,420 - mmocr - INFO - Epoch [1][300/361] lr: 1.000e-03, eta: 0:00:52, time: 0.118, data_time: 0.004, memory: 997, loss_node: 0.3853, loss_edge: 0.0007, acc_node: 87.0350, acc_edge: 100.0000, loss: 0.3860 2021-11-16 11:45:25,337 - mmocr - INFO - Epoch [1][350/361] lr: 1.000e-03, eta: 0:00:46, time: 0.118, data_time: 0.004, memory: 997, loss_node: 0.3552, loss_edge: 0.0007, acc_node: 87.0647, acc_edge: 100.0000, loss: 0.3559 [>>>>>>>>>>>>>>>>>>>>>>>>>>>] 1705/1705, 32.6 task/s, elapsed: 52s, ETA: 0sTraceback (most recent call last): File "/mmocr/tools/train.py", line 212, in main() File "/mmocr/tools/train.py", line 208, in main meta=meta) File "/mmocr/mmocr/apis/train.py", line 161, in train_detector runner.run(data_loaders, cfg.workflow) File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 125, in run epoch_runner(data_loaders[i], kwargs) File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 54, in train self.call_hook('after_train_epoch') File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/base_runner.py", line 307, in call_hook getattr(hook, fn_name)(self) File "/opt/conda/lib/python3.7/site-packages/mmdet/core/evaluation/eval_hooks.py", line 147, in after_train_epoch key_score = self.evaluate(runner, results) File "/opt/conda/lib/python3.7/site-packages/mmdet/core/evaluation/eval_hooks.py", line 177, in evaluate results, logger=runner.logger, self.eval_kwargs) File "/mmocr/mmocr/datasets/kie_dataset.py", line 148, in evaluate return self.compute_macro_f1(results, *metric_options['macro_f1']) File "/mmocr/mmocr/datasets/kie_dataset.py", line 162, in compute_macro_f1 node_f1s = compute_f1_score(node_preds, node_gts, ignores) File "/mmocr/mmocr/core/evaluation/kie_metric.py", line 22, in compute_f1_score gts C + preds.argmax(1), minlength=C**2).view(C, C).float() RuntimeError: bincount only supports 1-d non-negative integral inputs.

cuhk-hbsun commented 2 years ago

num_chars should be dict_size + 1 since blank is added in https://github.com/open-mmlab/mmocr/blob/main/mmocr/datasets/kie_dataset.py#L59. Please change num_chars to 5112 If the problem is still there, please attach a piece of your annotation file (like 100 lines) and dictionary file here, and we will use them to locate the problem.

open-mmlab / mmocr

train sdmgr loss Nan #591