open-mmlab / mmocr

OpenMMLab Text Detection, Recognition and Understanding Toolbox
https://mmocr.readthedocs.io/en/dev-1.x/
Apache License 2.0
4.35k stars 750 forks source link

the batchsize of sdmgr #598

Closed mrlihellohorld closed 2 years ago

mrlihellohorld commented 2 years ago

hi, i want to modifiy the batchsize when train sdmgr model. so I have modified the 'samples_per_gpu' from 1 to 16, but encountered an error when eval: The size of tensor a (2067) must match the size of tensor b (7129) at non-singleton dimension 0 could u help me to slove the problem?

gaotongxiao commented 2 years ago

Please share detailed information using the error report template.

mrlihellohorld commented 2 years ago

ssh://root@172.16.135.60:2244/opt/conda/bin/python -u /root/.pycharm_helpers/pydev/pydevd.py --multiproc --qt-support=auto --client 0.0.0.0 --port 35861 --file /mmocr/tools/train.py configs/kie/sdmgr/sdmgr_unet16_60e_subtitile_classify.py Connected to pydev debugger (build 212.5457.59) 2021-11-18 06:29:39,922 - mmocr - INFO - Environment info:

sys.platform: linux Python: 3.7.7 (default, Mar 23 2020, 22:36:06) [GCC 7.3.0] CUDA available: True GPU 0: GeForce RTX 2070 CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 10.1, V10.1.243 GCC: gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 PyTorch: 1.5.0 PyTorch compiling details: PyTorch built with:

TorchVision: 0.6.0a0+82fd1c8 OpenCV: 4.5.2 MMCV: 1.3.4 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 10.1 MMOCR: 0.2.0+ae90dea

2021-11-18 06:29:41,772 - mmocr - INFO - Distributed training: False 2021-11-18 06:29:43,616 - mmocr - INFO - Config: img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) max_scale = 1024 min_scale = 512 train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations'), dict(type='Resize', img_scale=(1024, 512), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.0), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='KIEFormatBundle'), dict( type='Collect', keys=['img', 'relations', 'texts', 'gt_bboxes', 'gt_labels']) ] test_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations'), dict(type='Resize', img_scale=(1024, 512), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.0), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='KIEFormatBundle'), dict(type='Collect', keys=['img', 'relations', 'texts', 'gt_bboxes']) ] dataset_type = 'KIEDataset' data_root = '/data/labels_convert/KIE_subtitle_classify' loader = dict( type='HardDiskLoader', repeat=1, parser=dict( type='LineJsonParser', keys=['file_name', 'height', 'width', 'annotations'])) train = dict( type='KIEDataset', ann_file='/data/labels_convert/KIE_subtitle_classify/train.txt', pipeline=[ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations'), dict(type='Resize', img_scale=(1024, 512), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.0), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='KIEFormatBundle'), dict( type='Collect', keys=['img', 'relations', 'texts', 'gt_bboxes', 'gt_labels']) ], img_prefix='/data/labels_convert/KIE_subtitle_classify', loader=dict( type='HardDiskLoader', repeat=1, parser=dict( type='LineJsonParser', keys=['file_name', 'height', 'width', 'annotations'])), dict_file='/data/labels_convert/KIE_subtitle_classify/dict.txt', test_mode=False) test = dict( type='KIEDataset', ann_file='/data/labels_convert/KIE_subtitle_classify/test.txt', pipeline=[ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations'), dict(type='Resize', img_scale=(1024, 512), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.0), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='KIEFormatBundle'), dict(type='Collect', keys=['img', 'relations', 'texts', 'gt_bboxes']) ], img_prefix='/data/labels_convert/KIE_subtitle_classify', loader=dict( type='HardDiskLoader', repeat=1, parser=dict( type='LineJsonParser', keys=['file_name', 'height', 'width', 'annotations'])), dict_file='/data/labels_convert/KIE_subtitle_classify/dict.txt', test_mode=True) data = dict( samples_per_gpu=4, workers_per_gpu=4, train=dict( type='KIEDataset', ann_file='/data/labels_convert/KIE_subtitle_classify/train.txt', pipeline=[ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations'), dict(type='Resize', img_scale=(1024, 512), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.0), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='KIEFormatBundle'), dict( type='Collect', keys=['img', 'relations', 'texts', 'gt_bboxes', 'gt_labels']) ], img_prefix='/data/labels_convert/KIE_subtitle_classify', loader=dict( type='HardDiskLoader', repeat=1, parser=dict( type='LineJsonParser', keys=['file_name', 'height', 'width', 'annotations'])), dict_file='/data/labels_convert/KIE_subtitle_classify/dict.txt', test_mode=False), val=dict( type='KIEDataset', ann_file='/data/labels_convert/KIE_subtitle_classify/test.txt', pipeline=[ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations'), dict(type='Resize', img_scale=(1024, 512), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.0), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='KIEFormatBundle'), dict( type='Collect', keys=['img', 'relations', 'texts', 'gt_bboxes']) ], img_prefix='/data/labels_convert/KIE_subtitle_classify', loader=dict( type='HardDiskLoader', repeat=1, parser=dict( type='LineJsonParser', keys=['file_name', 'height', 'width', 'annotations'])), dict_file='/data/labels_convert/KIE_subtitle_classify/dict.txt', test_mode=True), test=dict( type='KIEDataset', ann_file='/data/labels_convert/KIE_subtitle_classify/test.txt', pipeline=[ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations'), dict(type='Resize', img_scale=(1024, 512), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.0), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='KIEFormatBundle'), dict( type='Collect', keys=['img', 'relations', 'texts', 'gt_bboxes']) ], img_prefix='/data/labels_convert/KIE_subtitle_classify', loader=dict( type='HardDiskLoader', repeat=1, parser=dict( type='LineJsonParser', keys=['file_name', 'height', 'width', 'annotations'])), dict_file='/data/labels_convert/KIE_subtitle_classify/dict.txt', test_mode=True)) evaluation = dict( interval=1, metric='macro_f1', metric_options=dict(macro_f1=dict(ignores=[]))) model = dict( type='SDMGR', backbone=dict(type='UNet', base_channels=16), bbox_head=dict( type='SDMGRHead', visual_dim=16, num_chars=5111, num_classes=7), visual_modality=True, train_cfg=None, test_cfg=None, class_list='/data/labels_convert/KIE_subtitle_classify/class_list.txt') optimizer = dict(type='Adam', weight_decay=0.0001) optimizer_config = dict(grad_clip=None) lr_config = dict( policy='step', warmup='linear', warmup_iters=1, warmup_ratio=1, step=[40, 50]) total_epochs = 100 checkpoint_config = dict(interval=1) log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')]) dist_params = dict(backend='nccl') log_level = 'INFO' load_from = None resume_from = None workflow = [('train', 16)] find_unused_parameters = True work_dir = './work_dirs/sdmgr_unet16_60e_subtitile_classify' gpu_ids = range(0, 1)

/mmocr/mmocr/apis/train.py:79: UserWarning: config is now expected to have a runner section, please set runner in your config. 'please set runner in your config.', UserWarning) 2021-11-18 06:32:35,865 - mmocr - INFO - Start running, host: root@mrli-HP-Z440, work_dir: /mmocr/work_dirs/sdmgr_unet16_60e_subtitile_classify 2021-11-18 06:32:35,867 - mmocr - INFO - workflow: [('train', 16)], max: 100 epochs 2021-11-18 06:32:55,801 - mmocr - INFO - Epoch [1][50/220] lr: 1.000e-03, eta: 2:25:40, time: 0.398, data_time: 0.076, memory: 3582, loss_node: 0.7484, loss_edge: 0.0372, acc_node: 76.9887, acc_edge: 100.0000, loss: 0.7856 2021-11-18 06:33:12,774 - mmocr - INFO - Epoch [1][100/220] lr: 1.000e-03, eta: 2:14:37, time: 0.339, data_time: 0.023, memory: 3582, loss_node: 0.4184, loss_edge: 0.0006, acc_node: 86.3113, acc_edge: 100.0000, loss: 0.4190 2021-11-18 06:33:30,242 - mmocr - INFO - Epoch [1][150/220] lr: 1.000e-03, eta: 2:11:56, time: 0.349, data_time: 0.024, memory: 3582, loss_node: 0.2208, loss_edge: 0.0002, acc_node: 92.5571, acc_edge: 100.0000, loss: 0.2209 2021-11-18 06:33:48,095 - mmocr - INFO - Epoch [1][200/220] lr: 1.000e-03, eta: 2:11:09, time: 0.357, data_time: 0.023, memory: 3582, loss_node: 0.1742, loss_edge: 0.0002, acc_node: 94.9150, acc_edge: 100.0000, loss: 0.1745 [>>>>>>> ] 220/880, 7.0 task/s, elapsed: 31s, ETA: 94sTraceback (most recent call last): File "/root/.pycharm_helpers/pydev/pydevd.py", line 1483, in _exec pydev_imports.execfile(file, globals, locals) # execute the script File "/root/.pycharm_helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile exec(compile(contents+"\n", file, 'exec'), glob, loc) File "/mmocr/tools/train.py", line 212, in main() File "/mmocr/tools/train.py", line 208, in main meta=meta) File "/mmocr/mmocr/apis/train.py", line 161, in train_detector runner.run(data_loaders, cfg.workflow) File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 125, in run epoch_runner(data_loaders[i], kwargs) File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 54, in train self.call_hook('after_train_epoch') File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/base_runner.py", line 307, in call_hook getattr(hook, fn_name)(self) File "/opt/conda/lib/python3.7/site-packages/mmdet/core/evaluation/eval_hooks.py", line 147, in after_train_epoch key_score = self.evaluate(runner, results) File "/opt/conda/lib/python3.7/site-packages/mmdet/core/evaluation/eval_hooks.py", line 177, in evaluate results, logger=runner.logger, self.eval_kwargs) File "/mmocr/mmocr/datasets/kie_dataset.py", line 148, in evaluate return self.compute_macro_f1(results, *metric_options['macro_f1']) File "/mmocr/mmocr/datasets/kie_dataset.py", line 162, in compute_macro_f1 node_f1s = compute_f1_score(node_preds, node_gts, ignores) File "/mmocr/mmocr/core/evaluation/kie_metric.py", line 22, in compute_f1_score gts C + preds.argmax(1), minlength=C**2).view(C, C).float() RuntimeError: The size of tensor a (2067) must match the size of tensor b (7129) at non-singleton dimension 0

Process finished with exit code 1

gaotongxiao commented 2 years ago

SDMGR does not support batch testing yet. You can move samples_per_gpu into data.train so that batch processing will only be activated for training. You may find an example illustration here.