mindspore-lab / mindocr

A toolbox of ocr models and algorithms based on MindSpore
https://mindspore-lab.github.io/mindocr/
Apache License 2.0
229 stars 56 forks source link

PaddleOCR转换错误 #690

Closed adamzhg closed 4 months ago

adamzhg commented 7 months ago

尝试将Paddleocr转到ascend 301上转换错误: 一、相关环境信息 服务器:华为鲲鹏服务器; OS:ubuntu20.04; 加速卡:Atlas 300I Model 3010(含4块Ascend 301); mindspore-lite: 2.2.13 CANN:7.0.0.beta1 固件与驱动:1.0.22.alpha(A300-3010-npu-firmware_7.1.0.3.220.run,A300-3010-npu-driver_23.0.0_linux-aarch64.run)

二、问题描述: 1、直接使用paddle2mindir.sh脚本 ppocr_model_name设为ch_PP-OCRv4, ch_PP-OCRv4_server均报错, 报错信息为:EZ3002: Optype [Conv2D] of Ops kernel [AIcoreEngine] is unsupported. Reason: Dynamic shape is not supported on this chip!.

2、不使用paddle2mindir.sh脚本,分别下载ch_ppocr_mobile_v2.0_cls_infer、ch_PP-OCRv4_det_infer、ch_PP-OCRv4_rec_infer,然后分别进行onnx格式转换及Lite MindIR格式转换; 1)、使用paddle2onnx 转换onnx文件均能成功; 2)、使用converter_lite 转换Lite MindIR格式文件时 --ch_ppocr_mobile_v2.0_cls_infer、ch_PP-OCRv4_det_infer通过将config.txt中的格式说明改为[ascend_context](paddle2mindir.sh脚本中为[acl_build_options]动态shape格式),无论是静态shape还是动态shape分档,都可以成功

--ch_PP-OCRv4_rec_infer中,config.txt中的格式改成静态shape或者动态shape分档模式,依旧报错: [ERROR] ME(173355,fffface8f010,converter_lite):2024-04-12-09:43:08.099.512 [mindspore/ccsrc/cxx_api/model/acl/model_converter.cc:155] BuildAirModel] Call aclgrphBuildModel fail: EE1001: The argument is invalid.Reason: rtGetDevMsg execute failed, reason=[context pointer null] Solution: 1.Check the input parameter range of the function. 2.Check the function invocation relationship. TraceBack (most recent call last): [Node:p2o.Reshape.67_ascend_mbatch_batch_0] Check shape failed, as x-input shape is [1,65,360], with shape size(23400), is input dynamic shape[0]. Shape-input shape is [5], with shape size(5). Y-output shape inferred is [0,65,3,8,15], with shape size(0).[FUNC:ReshapeInfer][FILE:array_ops.cc][LINE:1717] Call InferShapeAndType for node:p2o.Reshape.67_ascend_mbatch_batch_0(Reshape) failed[FUNC:Infer][FILE:infershape_pass.cc][LINE:119] process pass InferShapePass on node:p2o.Reshape.67_ascend_mbatch_batch_0 failed, ret:4294967295[FUNC:RunPassesOnNode][FILE:base_pass.cc][LINE:571] build graph failed, graph id:0, ret:1343242270[FUNC:BuildModelWithGraphId][FILE:ge_generator.cc][LINE:1615] ctx is NULL![FUNC:GetDevErrMsg][FILE:api_impl.cc][LINE:4541] The argument is invalid.Reason: rtGetDevMsg execute failed, reason=[context pointer null]

[ERROR] ME(173355,fffface8f010,converter_lite):2024-04-12-09:43:09.292.674 [mindspore/ccsrc/cxx_api/model/acl/model_converter.cc:238] operator()] Convert model from MindIR to OM failed [ERROR] ME(173355,fffface8f010,converter_lite):2024-04-12-09:43:09.293.737 [mindspore/ccsrc/cxx_api/model/model_converter_utils/multi_process.cc:140] ChildProcess] Child process process failed [WARNING] ME(173331,ffff9e5119c0,converter_lite):2024-04-12-09:43:09.320.495 [mindspore/ccsrc/cxx_api/model/model_converter_utils/multi_process.cc:228] HeartbeatThreadFuncInner] Peer stopped [ERROR] ME(173331,fffface8f010,converter_lite):2024-04-12-09:43:09.321.148 [mindspore/ccsrc/cxx_api/model/acl/model_converter.cc:218] operator()] Receive result model from child process failed [ERROR] ME(173331,fffface8f010,converter_lite):2024-04-12-09:43:09.323.533 [mindspore/ccsrc/cxx_api/model/model_converter_utils/multi_process.cc:118] ParentProcess] Parent process process failed [ERROR] ME(173331,fffface8f010,converter_lite):2024-04-12-09:43:10.329.503 [mindspore/ccsrc/cxx_api/model/acl/model_converter.cc:251] LoadMindIR] Convert MindIR model to OM model failed [ERROR] LITE(173331,fffface8f010,converter_lite):2024-04-12-09:43:10.329.610 [mindspore/lite/tools/converter/adapter/acl/src/acl_pass_impl.cc:805] ConvertGraphToOm] Model converter load mindir failed. [ERROR] LITE(173331,fffface8f010,converter_lite):2024-04-12-09:43:10.329.646 [mindspore/lite/tools/converter/adapter/acl/src/acl_pass_impl.cc:854] BuildGraph] Convert graph to om failed. [ERROR] LITE(173331,fffface8f010,converter_lite):2024-04-12-09:43:10.329.708 [mindspore/lite/tools/converter/adapter/acl/src/acl_pass_impl.cc:1112] Run] Build graph failed. [ERROR] LITE(173331,fffface8f010,converter_lite):2024-04-12-09:43:10.329.753 [mindspore/lite/tools/converter/adapter/acl/acl_pass.cc:42] Run] Acl pass impl run failed. [ERROR] LITE(173331,fffface8f010,converter_lite):2024-04-12-09:43:10.329.788 [mindspore/lite/tools/converter/anf_transform.cc:469] RunConvertPass] Acl pass failed. [ERROR] LITE(173331,fffface8f010,converter_lite):2024-04-12-09:43:10.329.840 [mindspore/lite/tools/converter/anf_transform.cc:662] RunPass] Run convert pass failed. [ERROR] LITE(173331,fffface8f010,converter_lite):2024-04-12-09:43:10.329.870 [mindspore/lite/tools/converter/anf_transform.cc:766] TransformFuncGraph] Proc online transform failed. [ERROR] LITE(173331,fffface8f010,converter_lite):2024-04-12-09:43:10.330.121 [mindspore/lite/tools/converter/anf_transform.cc:858] Transform] optimizer failed. [ERROR] LITE(173331,fffface8f010,converter_lite):2024-04-12-09:43:10.330.159 [mindspore/lite/tools/converter/converter_funcgraph.cc:489] Optimize] Transform anf graph failed. [ERROR] LITE(173331,fffface8f010,converter_lite):2024-04-12-09:43:10.330.216 [mindspore/lite/tools/converter/converter.cc:1030] HandleGraphCommon] Optimize func graph failed: -2 NULL pointer returned. [ERROR] LITE(173331,fffface8f010,converter_lite):2024-04-12-09:43:10.330.260 [mindspore/lite/tools/converter/converter.cc:980] Convert] Handle graph failed: -2 NULL pointer returned. [ERROR] LITE(173331,fffface8f010,converter_lite):2024-04-12-09:43:10.330.292 [mindspore/lite/tools/converter/converter.cc:1168] RunConverter] Convert model failed [ERROR] LITE(173331,fffface8f010,converter_lite):2024-04-12-09:43:10.330.332 [mindspore/lite/tools/converter/cxx_api/converter.cc:374] Convert] Convert model failed, ret=NULL pointer returned. ERROR [mindspore/lite/tools/converter/converter_lite/main.cc:104] main] Convert failed. Ret: NULL pointer returned. Convert failed. Ret: NULL pointer returned.

horcham commented 7 months ago

Atlas 300I Model 3010下的310,可能对动态shape转换支持不完全。可以尝试以下方法

  1. 静态模型,config.txt改为静态。需注意[ascend_context], input_shape这两个字段,和动态shape的config.txt不一样。例如
    [ascend_context]
    input_format=NCHW
    input_shape=x:[1,3,736,1280]

    详细见 https://github.com/horcham/mindocr/blob/main/docs/cn/inference/convert_tutorial.md

  2. ch_PP-OCRv4_rec_infer模型,检查input_shape的大小是否正确?其应为[1,3,48,320]
adamzhg commented 7 months ago

ch_PP-OCRv4_rec_infer无法生成Lite MindIR格式文件的问题通过将mindspore-lite: 2.2.13的版本降为2.2.10解决了。 但是遇到如下新问题: 一、相关环境信息 服务器:华为鲲鹏服务器; OS:ubuntu20.04; 加速卡:Atlas 300I Model 3010(含4块Ascend 301); Python:3.8 mindspore-lite: 2.2.10 CANN:7.0.0.beta1 固件与驱动:1.0.22.alpha(A300-3010-npu-firmware_7.1.0.3.220.run,A300-3010-npu-driver_23.0.0_linux-aarch64.run)

二、问题描述: 问题1: 1、按照官网说明将ch_PP-OCRv4_det_server_infer、ch_ppocr_mobile_v2.0_cls_infer、ch_PP-OCRv4_rec_server_infer分别生成Lite MindIR格式文件(用的OCRv4的Server版,精度希望能更高一些); 2、调用脚本如下: python deploy/py_infer/infer.py \ --input_images_dir=XXX \ --det_model_path=xxx/ch_PP-OCRv4_det_server_infer/det_db_dynamic_output.mindir \ --det_model_name_or_config=ch_pp_det_OCRv4 \ --cls_model_path=xxx/ch_ppocr_mobile_v2.0_cls_infer/cls_mv4_dynamic_output.mindir \ --cls_model_name_or_config=ch_pp_mobile_cls_v2.0 \ --rec_model_path=xxx/ch_PP-OCRv4_rec_server_infer/rec_crnn_dynamic_output.mindir \ --rec_model_name_or_config=ch_pp_rec_OCRv4 \ --character_dict_path=xxx/ppocr_keys_v1.txt \ --res_save_dir=xxx/output \ --vis_pipeline_save_dir=xxx/output \ --show_log=True 3、报错: File "/data/sunxh/adamzhg/projects/mindspore/mindocr/deploy/py_infer/src/data_process/preprocess/transforms/det_transforms.py", line 33, in init super().init( TypeError: init() got multiple values for keyword argument 'limit_side_len'

问题2: 1、在问题1的基础上,修改mindocr/deploy/py_infer/src/data_process/preprocess/transforms/det_transforms.py文件代码如下: skipped = ("target_size", "limit_type", "limit_side_len", "force_divisable")

skipped = ("target_size", "limit_type", "force_divisable")

将limit_side_len从kwargs中去掉;

2、成功完成了图片的文字检测和识别,但是精度惨不忍睹 1-1 请帮忙确认是什么地方有问题吗? --det_model_name_or_config、--cls_model_name_or_config这些配置文件需要调整吗?

另外,如下问题也请帮忙回答: 1、请问做convert成MindIR格式文件时固定的输入尺寸是要求输入模型前要做处理吗? 比如det我设的是: [ascend_context] input_format=NCHW input_shape=x:[1,3,-1,-1] dynamic_dims=[736,1280],[768,1280],[896,1280],[1024,1280] 是说需要先将图像处理成[736,1280],[768,1280],[896,1280],[1024,1280]的一种吗? 2、input_format=NCHW,是说需要先将图像转成NCHW格式才能输入模型吗?PPOCR默认是不需要转的,opencv读出来的NHWC格式就直接输入了,是说这个地方有问题吗?

horcham commented 7 months ago

可能det模型检测结果存在问题,可能因为310对动态shape存在不适配,但对ppocr的模型静态或分档后,图片放缩到对应shape喂入网络,导致检测效果不好。可以尝试下用dbnet_resnet50等自有模型,或ch_pp_det_OCRv4,单独跑det看看效果.

python deploy/py_infer/infer.py
--input_images_dir=XXX
--det_model_path=xxx/xxxx.mindir
--det_model_name_or_config=xxxx
--res_save_dir=/path/to/results
--vis_det_save_dir=/path/to/vis_results

如使用dbnet_resnet50跑模型,则config文件用configs/det/dbnet/db_r50_icdar15.yaml