the batchsize of sdmgr - Githubissues

mrlihellohorld commented 2 years ago

hi, i want to modifiy the batchsize when train sdmgr model. so I have modified the 'samples_per_gpu' from 1 to 16, but encountered an error when eval: The size of tensor a (2067) must match the size of tensor b (7129) at non-singleton dimension 0 could u help me to slove the problem?

gaotongxiao commented 2 years ago

Please share detailed information using the error report template.

mrlihellohorld commented 2 years ago

ssh://root@172.16.135.60:2244/opt/conda/bin/python -u /root/.pycharm_helpers/pydev/pydevd.py --multiproc --qt-support=auto --client 0.0.0.0 --port 35861 --file /mmocr/tools/train.py configs/kie/sdmgr/sdmgr_unet16_60e_subtitile_classify.py Connected to pydev debugger (build 212.5457.59) 2021-11-18 06:29:39,922 - mmocr - INFO - Environment info:

sys.platform: linux Python: 3.7.7 (default, Mar 23 2020, 22:36:06) [GCC 7.3.0] CUDA available: True GPU 0: GeForce RTX 2070 CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 10.1, V10.1.243 GCC: gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 PyTorch: 1.5.0 PyTorch compiling details: PyTorch built with:

GCC 7.3
C++ Version: 201402
Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc)
OpenMP 201511 (a.k.a. OpenMP 4.5)
NNPACK is enabled
CPU capability usage: AVX2
CUDA Runtime 10.1
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
CuDNN 7.6.3
Magma 2.5.2
Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_INTERNAL_THREADPOOL_IMPL -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

TorchVision: 0.6.0a0+82fd1c8 OpenCV: 4.5.2 MMCV: 1.3.4 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 10.1 MMOCR: 0.2.0+ae90dea

2021-11-18 06:29:41,772 - mmocr - INFO - Distributed training: False 2021-11-18 06:29:43,616 - mmocr - INFO - Config: img_norm_cfg = dict( mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) max_scale = 1024 min_scale = 512 train_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations'), dict(type='Resize', img_scale=(1024, 512), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.0), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='KIEFormatBundle'), dict( type='Collect', keys=['img', 'relations', 'texts', 'gt_bboxes', 'gt_labels']) ] test_pipeline = [ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations'), dict(type='Resize', img_scale=(1024, 512), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.0), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='KIEFormatBundle'), dict(type='Collect', keys=['img', 'relations', 'texts', 'gt_bboxes']) ] dataset_type = 'KIEDataset' data_root = '/data/labels_convert/KIE_subtitle_classify' loader = dict( type='HardDiskLoader', repeat=1, parser=dict( type='LineJsonParser', keys=['file_name', 'height', 'width', 'annotations'])) train = dict( type='KIEDataset', ann_file='/data/labels_convert/KIE_subtitle_classify/train.txt', pipeline=[ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations'), dict(type='Resize', img_scale=(1024, 512), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.0), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='KIEFormatBundle'), dict( type='Collect', keys=['img', 'relations', 'texts', 'gt_bboxes', 'gt_labels']) ], img_prefix='/data/labels_convert/KIE_subtitle_classify', loader=dict( type='HardDiskLoader', repeat=1, parser=dict( type='LineJsonParser', keys=['file_name', 'height', 'width', 'annotations'])), dict_file='/data/labels_convert/KIE_subtitle_classify/dict.txt', test_mode=False) test = dict( type='KIEDataset', ann_file='/data/labels_convert/KIE_subtitle_classify/test.txt', pipeline=[ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations'), dict(type='Resize', img_scale=(1024, 512), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.0), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='KIEFormatBundle'), dict(type='Collect', keys=['img', 'relations', 'texts', 'gt_bboxes']) ], img_prefix='/data/labels_convert/KIE_subtitle_classify', loader=dict( type='HardDiskLoader', repeat=1, parser=dict( type='LineJsonParser', keys=['file_name', 'height', 'width', 'annotations'])), dict_file='/data/labels_convert/KIE_subtitle_classify/dict.txt', test_mode=True) data = dict( samples_per_gpu=4, workers_per_gpu=4, train=dict( type='KIEDataset', ann_file='/data/labels_convert/KIE_subtitle_classify/train.txt', pipeline=[ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations'), dict(type='Resize', img_scale=(1024, 512), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.0), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='KIEFormatBundle'), dict( type='Collect', keys=['img', 'relations', 'texts', 'gt_bboxes', 'gt_labels']) ], img_prefix='/data/labels_convert/KIE_subtitle_classify', loader=dict( type='HardDiskLoader', repeat=1, parser=dict( type='LineJsonParser', keys=['file_name', 'height', 'width', 'annotations'])), dict_file='/data/labels_convert/KIE_subtitle_classify/dict.txt', test_mode=False), val=dict( type='KIEDataset', ann_file='/data/labels_convert/KIE_subtitle_classify/test.txt', pipeline=[ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations'), dict(type='Resize', img_scale=(1024, 512), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.0), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='KIEFormatBundle'), dict( type='Collect', keys=['img', 'relations', 'texts', 'gt_bboxes']) ], img_prefix='/data/labels_convert/KIE_subtitle_classify', loader=dict( type='HardDiskLoader', repeat=1, parser=dict( type='LineJsonParser', keys=['file_name', 'height', 'width', 'annotations'])), dict_file='/data/labels_convert/KIE_subtitle_classify/dict.txt', test_mode=True), test=dict( type='KIEDataset', ann_file='/data/labels_convert/KIE_subtitle_classify/test.txt', pipeline=[ dict(type='LoadImageFromFile'), dict(type='LoadAnnotations'), dict(type='Resize', img_scale=(1024, 512), keep_ratio=True), dict(type='RandomFlip', flip_ratio=0.0), dict( type='Normalize', mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True), dict(type='Pad', size_divisor=32), dict(type='KIEFormatBundle'), dict( type='Collect', keys=['img', 'relations', 'texts', 'gt_bboxes']) ], img_prefix='/data/labels_convert/KIE_subtitle_classify', loader=dict( type='HardDiskLoader', repeat=1, parser=dict( type='LineJsonParser', keys=['file_name', 'height', 'width', 'annotations'])), dict_file='/data/labels_convert/KIE_subtitle_classify/dict.txt', test_mode=True)) evaluation = dict( interval=1, metric='macro_f1', metric_options=dict(macro_f1=dict(ignores=[]))) model = dict( type='SDMGR', backbone=dict(type='UNet', base_channels=16), bbox_head=dict( type='SDMGRHead', visual_dim=16, num_chars=5111, num_classes=7), visual_modality=True, train_cfg=None, test_cfg=None, class_list='/data/labels_convert/KIE_subtitle_classify/class_list.txt') optimizer = dict(type='Adam', weight_decay=0.0001) optimizer_config = dict(grad_clip=None) lr_config = dict( policy='step', warmup='linear', warmup_iters=1, warmup_ratio=1, step=[40, 50]) total_epochs = 100 checkpoint_config = dict(interval=1) log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')]) dist_params = dict(backend='nccl') log_level = 'INFO' load_from = None resume_from = None workflow = [('train', 16)] find_unused_parameters = True work_dir = './work_dirs/sdmgr_unet16_60e_subtitile_classify' gpu_ids = range(0, 1)

/mmocr/mmocr/apis/train.py:79: UserWarning: config is now expected to have a runner section, please set runner in your config. 'please set runner in your config.', UserWarning) 2021-11-18 06:32:35,865 - mmocr - INFO - Start running, host: root@mrli-HP-Z440, work_dir: /mmocr/work_dirs/sdmgr_unet16_60e_subtitile_classify 2021-11-18 06:32:35,867 - mmocr - INFO - workflow: [('train', 16)], max: 100 epochs 2021-11-18 06:32:55,801 - mmocr - INFO - Epoch [1][50/220] lr: 1.000e-03, eta: 2:25:40, time: 0.398, data_time: 0.076, memory: 3582, loss_node: 0.7484, loss_edge: 0.0372, acc_node: 76.9887, acc_edge: 100.0000, loss: 0.7856 2021-11-18 06:33:12,774 - mmocr - INFO - Epoch [1][100/220] lr: 1.000e-03, eta: 2:14:37, time: 0.339, data_time: 0.023, memory: 3582, loss_node: 0.4184, loss_edge: 0.0006, acc_node: 86.3113, acc_edge: 100.0000, loss: 0.4190 2021-11-18 06:33:30,242 - mmocr - INFO - Epoch [1][150/220] lr: 1.000e-03, eta: 2:11:56, time: 0.349, data_time: 0.024, memory: 3582, loss_node: 0.2208, loss_edge: 0.0002, acc_node: 92.5571, acc_edge: 100.0000, loss: 0.2209 2021-11-18 06:33:48,095 - mmocr - INFO - Epoch [1][200/220] lr: 1.000e-03, eta: 2:11:09, time: 0.357, data_time: 0.023, memory: 3582, loss_node: 0.1742, loss_edge: 0.0002, acc_node: 94.9150, acc_edge: 100.0000, loss: 0.1745 [>>>>>>> ] 220/880, 7.0 task/s, elapsed: 31s, ETA: 94sTraceback (most recent call last): File "/root/.pycharm_helpers/pydev/pydevd.py", line 1483, in _exec pydev_imports.execfile(file, globals, locals) # execute the script File "/root/.pycharm_helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile exec(compile(contents+"\n", file, 'exec'), glob, loc) File "/mmocr/tools/train.py", line 212, in main() File "/mmocr/tools/train.py", line 208, in main meta=meta) File "/mmocr/mmocr/apis/train.py", line 161, in train_detector runner.run(data_loaders, cfg.workflow) File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 125, in run epoch_runner(data_loaders[i], kwargs) File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 54, in train self.call_hook('after_train_epoch') File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/base_runner.py", line 307, in call_hook getattr(hook, fn_name)(self) File "/opt/conda/lib/python3.7/site-packages/mmdet/core/evaluation/eval_hooks.py", line 147, in after_train_epoch key_score = self.evaluate(runner, results) File "/opt/conda/lib/python3.7/site-packages/mmdet/core/evaluation/eval_hooks.py", line 177, in evaluate results, logger=runner.logger, self.eval_kwargs) File "/mmocr/mmocr/datasets/kie_dataset.py", line 148, in evaluate return self.compute_macro_f1(results, *metric_options['macro_f1']) File "/mmocr/mmocr/datasets/kie_dataset.py", line 162, in compute_macro_f1 node_f1s = compute_f1_score(node_preds, node_gts, ignores) File "/mmocr/mmocr/core/evaluation/kie_metric.py", line 22, in compute_f1_score gts C + preds.argmax(1), minlength=C**2).view(C, C).float() RuntimeError: The size of tensor a (2067) must match the size of tensor b (7129) at non-singleton dimension 0

Process finished with exit code 1

gaotongxiao commented 2 years ago

SDMGR does not support batch testing yet. You can move samples_per_gpu into data.train so that batch processing will only be activated for training. You may find an example illustration here.

open-mmlab / mmocr

the batchsize of sdmgr #598

TorchVision: 0.6.0a0+82fd1c8 OpenCV: 4.5.2 MMCV: 1.3.4 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 10.1 MMOCR: 0.2.0+ae90dea