mindspore-lab / mindocr

A toolbox of ocr models and algorithms based on MindSpore
https://mindspore-lab.github.io/mindocr/
Apache License 2.0
233 stars 56 forks source link

在8卡910B3的机器上推理pp-ocrv4报错 #700

Closed bltcn closed 6 months ago

bltcn commented 6 months ago

操作系统: [root@localhost data]# uname -r 4.19.90-24.4.v2101.ky10.aarch64 [root@localhost data]# cat /etc/os-release NAME="Kylin Linux Advanced Server" VERSION="V10 (Sword)" ID="kylin" VERSION_ID="V10" PRETTY_NAME="Kylin Linux Advanced Server V10 (Sword)" ANSI_COLOR="0;31"

镜像版本: swr.cn-central-221.ovaijisuan.com/mindocr/mindocr_dev_ms_2_2_10_cann7_0_py39:v1

驱动版本: (MindSpore) [root@e9895d5bcb9e mindocr]# npu-smi info +------------------------------------------------------------------------------------------------+ | npu-smi 23.0.rc2.2 Version: 23.0.rc3.3 | +---------------------------+---------------+----------------------------------------------------+ | NPU Name | Health | Power(W) Temp(C) Hugepages-Usage(page)| | Chip | Bus-Id | AICore(%) Memory-Usage(MB) HBM-Usage(MB) | +===========================+===============+====================================================+ | 0 910B3 | OK | 84.7 34 0 / 0 | | 0 | 0000:C1:00.0 | 0 0 / 0 4156 / 65536 | +===========================+===============+====================================================+ | 1 910B3 | OK | 84.7 33 0 / 0 | | 0 | 0000:C2:00.0 | 0 0 / 0 4154 / 65536 | +===========================+===============+====================================================+ | 2 910B3 | OK | 84.7 32 0 / 0 | | 0 | 0000:81:00.0 | 0 0 / 0 4154 / 65536 | +===========================+===============+====================================================+ | 3 910B3 | OK | 84.8 32 0 / 0 | | 0 | 0000:82:00.0 | 0 0 / 0 4154 / 65536 | +===========================+===============+====================================================+ | 4 910B3 | OK | 84.8 35 0 / 0 | | 0 | 0000:01:00.0 | 0 0 / 0 4154 / 65536 | +===========================+===============+====================================================+ | 5 910B3 | OK | 84.8 35 0 / 0 | | 0 | 0000:02:00.0 | 0 0 / 0 4155 / 65536 | +===========================+===============+====================================================+ | 6 910B3 | OK | 94.2 35 0 / 0 | | 0 | 0000:41:00.0 | 0 0 / 0 4155 / 65536 | +===========================+===============+====================================================+ | 7 910B3 | OK | 84.7 36 0 / 0 | | 0 | 0000:42:00.0 | 0 0 / 0 4155 / 65536 | +===========================+===============+====================================================+ +---------------------------+---------------+----------------------------------------------------+ | NPU Chip | Process id | Process name | Process memory(MB) | +===========================+===============+====================================================+ | No running processes found in NPU 0 | +===========================+===============+====================================================+ | No running processes found in NPU 1 | +===========================+===============+====================================================+ | No running processes found in NPU 2 | +===========================+===============+====================================================+ | No running processes found in NPU 3 | +===========================+===============+====================================================+ | No running processes found in NPU 4 | +===========================+===============+====================================================+ | No running processes found in NPU 5 | +===========================+===============+====================================================+ | No running processes found in NPU 6 | +===========================+===============+====================================================+ | No running processes found in NPU 7 | +===========================+===============+====================================================+

运行日志: (MindSpore) [root@e9895d5bcb9e mindocr]# python deploy/py_infer/infer.py --input_images_dir deploy/py_infer/example/dataset/cls_rec/ -rec_model_path=tools/ppocr_models/r ec_crnn_dynamic_output.mindir --rec_model_name_or_config=ch_pp_rec_OCRv4 --character_dict_path=../ppocr_models/ppocr_keys_v1.txt --res_save_dir=result > 1.txt Traceback (most recent call last): File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/sklearn/check_build/init.py", line 45, in from ._check_build import check_build # noqa ImportError: /home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/sklearn/check_build/../../scikit_learn.libs/libgomp-d22c30c5.so.1.0.0: cannot allocate memory in static TLS block

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/ma-user/code/mindocr/deploy/py_infer/infer.py", line 7, in from src import infer_args # noqa File "/home/ma-user/code/mindocr/deploy/py_infer/src/infer_args.py", line 5, in from .infer import TaskType File "/home/ma-user/code/mindocr/deploy/py_infer/src/infer/init.py", line 3, in from .infer_cls import TextClassifier File "/home/ma-user/code/mindocr/deploy/py_infer/src/infer/infer_cls.py", line 6, in from ..data_process import build_postprocess, build_preprocess, cv_utils, gear_utils File "/home/ma-user/code/mindocr/deploy/py_infer/src/data_process/init.py", line 1, in from .postprocess import build_postprocess File "/home/ma-user/code/mindocr/deploy/py_infer/src/data_process/postprocess/init.py", line 1, in from .builder import build_postprocess File "/home/ma-user/code/mindocr/deploy/py_infer/src/data_process/postprocess/builder.py", line 10, in from . import adapted_postprocess File "/home/ma-user/code/mindocr/deploy/py_infer/src/data_process/postprocess/adapted_postprocess.py", line 3, in from .det_db_postprocess import * # noqa File "/home/ma-user/code/mindocr/deploy/py_infer/src/data_process/postprocess/det_db_postprocess.py", line 18, in from mindocr.postprocess import det_base_postprocess # noqa File "/home/ma-user/code/mindocr/mindocr/init.py", line 1, in from . import data, losses, metrics, models, postprocess, utils File "/home/ma-user/code/mindocr/mindocr/metrics/init.py", line 1, in from .builder import build_metric File "/home/ma-user/code/mindocr/mindocr/metrics/builder.py", line 1, in from . import cls_metrics, det_metrics, kie_metrics, layout_metrics, rec_metrics File "/home/ma-user/code/mindocr/mindocr/metrics/kie_metrics.py", line 6, in import seqeval.metrics File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/seqeval/metrics/init.py", line 1, in from seqeval.metrics.sequence_labeling import (accuracy_score, File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/seqeval/metrics/sequence_labeling.py", line 14, in from seqeval.metrics.v1 import SCORES, _precision_recall_fscore_support File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/seqeval/metrics/v1.py", line 5, in from sklearn.exceptions import UndefinedMetricWarning File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/sklearn/init.py", line 83, in from . import ( File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/sklearn/check_build/init.py", line 47, in raise_build_error(e) File "/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/sklearn/__check_build/init.py", line 31, in raise_build_error raise ImportError("""%s ImportError: /home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/sklearn/check_build/../../scikit_learn.libs/libgomp-d22c30c5.so.1.0.0: cannot allocate memory in static TLS block


Contents of /home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/sklearn/__check_build: _check_build.cpython-39-aarch64-linux-gnu.sopycache init.py


It seems that scikit-learn has not been built correctly.

If you have installed scikit-learn from source, please do not forget to build the package before using it: run python setup.py install or make in the source directory.

If you have used an installer, please check that it is suited for your Python version, your operating system and your platform. infer.txt npu-smi-info.txt os.txt

bltcn commented 6 months ago

此问题已经解决,参考https://blog.csdn.net/mbdong/article/details/122321835 在镜像的env_setup.sh中加入一句 export LD_PRELOAD=/home/ma-user/anaconda3/envs/MindSpore/lib/python3.9/site-packages/scikit_learn.libs/libgomp-d22c30c5.so.1.0.0:$LD_PRELOAD 就可以正常运行