modelbox-ai / modelbox

A high performance, high expansion, easy to use framework for AI application. 为AI应用的开发者提供一套统一的高性能、易用的编程框架,快速基于AI全栈服务、开发跨端边云的AI行业应用,支持GPU,NPU加速。
https://modelbox-ai.com
Apache License 2.0
133 stars 39 forks source link

运行推理功能,出现报错 #304

Open Ymh13383894400 opened 1 year ago

Ymh13383894400 commented 1 year ago

[2022-10-27 15:31:15,764][ INFO][ flow.cc:97 ] run flow dectection_sedna/src/graph/graph_dectection_sedna.toml [2022-10-27 15:31:15,793][ INFO][ driver.cc:715 ] Gather scan info success, drivers count 46 [2022-10-27 15:31:15,793][ INFO][ driver.cc:961 ] begin scan virtual drivers [2022-10-27 15:31:16,739][ INFO][virtualdriver_inference.cc:80 ] Add virtual driver /root/dectection_sedna/src/flowunit/helm_infer/helm_infer.toml success [2022-10-27 15:31:16,755][ INFO][virtualdriver_python.cc:78 ] Add virtual driver /root/dectection_sedna/src/flowunit/yolo3_post/yolo3_post.toml success [2022-10-27 15:31:16,779][ INFO][ driver.cc:963 ] end scan virtual drivers [2022-10-27 15:31:16,782][ INFO][ graph_manager.cc:304 ] graph.format : graphviz [2022-10-27 15:31:16,785][ WARN][ driver_desc.cc:74 ] set cuda device flags 0 failed, cuda ret 35 [2022-10-27 15:31:16,786][ERROR][ devicecuda.cc:87 ] count device failed, cuda ret 35 [2022-10-27 15:31:17,333][ WARN][ flowunit.cc:117 ] inference is not match, you can use a-z, A-Z, 1-9, and uppercase the first character. [2022-10-27 15:31:17,333][ WARN][virtualdriver_inference.cc:358 ] check group type failed , your group_type is inference, the right grouptype is a or a/b , for instance input or input/http. [2022-10-27 15:31:17,336][ WARN][ flowunit.cc:117 ] generic is not match, you can use a-z, A-Z, 1-9, and uppercase the first character. [2022-10-27 15:31:17,336][ WARN][virtualdriver_python.cc:358 ] check group type failed , your group_type is generic, the right group_type is a or a/b , for instance input or input/http. [2022-10-27 15:31:17,338][ INFO][ graph.cc:116 ] Build graph name:graph_dectection_sedna, id:3a8b9fb4-b241-434a-87e9-747213c50737 [2022-10-27 15:31:17,338][ INFO][ graph_manager.cc:218 ] node name : helm_infer6 [2022-10-27 15:31:17,338][ INFO][ graph_manager.cc:223 ] input port : images [2022-10-27 15:31:17,338][ INFO][ graph_manager.cc:229 ] output port : boxes [2022-10-27 15:31:17,338][ INFO][ graph_manager.cc:229 ] output port : classes [2022-10-27 15:31:17,338][ INFO][ graph_manager.cc:229 ] output port : scores [2022-10-27 15:31:17,338][ INFO][ graph_manager.cc:218 ] node name : normalize5 [2022-10-27 15:31:17,338][ INFO][ graph_manager.cc:223 ] input port : in_data [2022-10-27 15:31:17,338][ INFO][ graph_manager.cc:229 ] output port : out_data [2022-10-27 15:31:17,338][ INFO][ graph_manager.cc:218 ] node name : packed_planar_transpose4 [2022-10-27 15:31:17,338][ INFO][ graph_manager.cc:223 ] input port : in_image [2022-10-27 15:31:17,339][ INFO][ graph_manager.cc:229 ] output port : out_image [2022-10-27 15:31:17,339][ INFO][ graph_manager.cc:218 ] node name : resize3 [2022-10-27 15:31:17,339][ INFO][ graph_manager.cc:223 ] input port : in_image [2022-10-27 15:31:17,339][ INFO][ graph_manager.cc:229 ] output port : out_image [2022-10-27 15:31:17,339][ INFO][ graph_manager.cc:218 ] node name : video_decoder2 [2022-10-27 15:31:17,339][ INFO][ graph_manager.cc:223 ] input port : in_video_packet [2022-10-27 15:31:17,339][ INFO][ graph_manager.cc:229 ] output port : out_video_frame [2022-10-27 15:31:17,339][ INFO][ graph_manager.cc:218 ] node name : video_demuxer1 [2022-10-27 15:31:17,339][ INFO][ graph_manager.cc:223 ] input port : in_video_url [2022-10-27 15:31:17,339][ INFO][ graph_manager.cc:229 ] output port : out_video_packet [2022-10-27 15:31:17,339][ INFO][ graph_manager.cc:218 ] node name : video_input0 [2022-10-27 15:31:17,339][ INFO][ graph_manager.cc:229 ] output port : out_video_url [2022-10-27 15:31:17,339][ INFO][ graph_manager.cc:218 ] node name : videoencoder [2022-10-27 15:31:17,339][ INFO][ graph_manager.cc:223 ] input port : in_video_frame [2022-10-27 15:31:17,339][ INFO][ graph_manager.cc:218 ] node name : yolo3_post [2022-10-27 15:31:17,339][ INFO][ graph_manager.cc:223 ] input port : in_boxes [2022-10-27 15:31:17,339][ INFO][ graph_manager.cc:223 ] input port : in_classes [2022-10-27 15:31:17,339][ INFO][ graph_manager.cc:223 ] input port : in_image [2022-10-27 15:31:17,339][ INFO][ graph_manager.cc:223 ] input port : in_scores [2022-10-27 15:31:17,340][ INFO][ graph_manager.cc:229 ] output port : out_image [2022-10-27 15:31:17,340][ INFO][ graph.cc:641 ] begin build node helminfer6 [2022-10-27 15:31:17,342][ WARN][ flowunit.cc:117 ] inference is not match, you can use a-z, A-Z, 1-9, and uppercase the first character. [2022-10-27 15:31:17,342][ WARN][virtualdriver_inference.cc:358 ] check group type failed , your group_type is inference, the right group_type is a or a/b , for instance input or input/http. [2022-10-27 15:31:17,343][ INFO][ graph.cc:647 ] build node helm_infer6 success [2022-10-27 15:31:17,344][ INFO][ graph.cc:641 ] begin build node normalize5 [2022-10-27 15:31:17,344][ INFO][ graph.cc:647 ] build node normalize5 success [2022-10-27 15:31:17,344][ INFO][ graph.cc:641 ] begin build node packed_planar_transpose4 [2022-10-27 15:31:17,344][ INFO][ graph.cc:647 ] build node packed_planar_transpose4 success [2022-10-27 15:31:17,344][ INFO][ graph.cc:641 ] begin build node resize3 [2022-10-27 15:31:17,345][ INFO][ graph.cc:647 ] build node resize3 success [2022-10-27 15:31:17,345][ INFO][ graph.cc:641 ] begin build node video_decoder2 [2022-10-27 15:31:17,345][ INFO][ graph.cc:647 ] build node video_decoder2 success [2022-10-27 15:31:17,345][ INFO][ graph.cc:641 ] begin build node video_demuxer1 [2022-10-27 15:31:17,345][ INFO][ graph.cc:647 ] build node video_demuxer1 success [2022-10-27 15:31:17,345][ INFO][ graph.cc:641 ] begin build node video_input0 [2022-10-27 15:31:17,346][ INFO][ graph.cc:647 ] build node video_input0 success [2022-10-27 15:31:17,346][ INFO][ graph.cc:641 ] begin build node videoencoder [2022-10-27 15:31:17,346][ INFO][ graph.cc:647 ] build node videoencoder success [2022-10-27 15:31:17,346][ INFO][ graph.cc:641 ] begin build node yolo3post [2022-10-27 15:31:17,350][ WARN][ flowunit.cc:117 ] generic is not match, you can use a-z, A-Z, 1-9, and uppercase the first character. [2022-10-27 15:31:17,350][ WARN][virtualdriver_python.cc:358 ] check group type failed , your group_type is generic, the right group_type is a or a/b , for instance input or input/http. [2022-10-27 15:31:17,351][ INFO][ graph.cc:647 ] build node yolo3_post success [2022-10-27 15:31:17,351][ INFO][ graph.cc:368 ] add link, helm_infer6:boxes -> yolo3_post:in_boxes [2022-10-27 15:31:17,351][ INFO][ graph.cc:368 ] add link, helm_infer6:classes -> yolo3_post:in_classes [2022-10-27 15:31:17,351][ INFO][ graph.cc:368 ] add link, helm_infer6:scores -> yolo3_post:in_scores [2022-10-27 15:31:17,351][ INFO][ graph.cc:368 ] add link, normalize5:out_data -> helm_infer6:images [2022-10-27 15:31:17,351][ INFO][ graph.cc:368 ] add link, packed_planar_transpose4:out_image -> normalize5:in_data [2022-10-27 15:31:17,351][ INFO][ graph.cc:368 ] add link, resize3:out_image -> packed_planar_transpose4:in_image [2022-10-27 15:31:17,351][ INFO][ graph.cc:368 ] add link, video_decoder2:out_video_frame -> resize3:in_image [2022-10-27 15:31:17,351][ INFO][ graph.cc:368 ] add link, video_decoder2:out_video_frame -> yolo3_post:in_image [2022-10-27 15:31:17,351][ INFO][ graph.cc:368 ] add link, video_demuxer1:out_video_packet -> video_decoder2:in_video_packet [2022-10-27 15:31:17,351][ INFO][ graph.cc:368 ] add link, video_input0:out_video_url -> video_demuxer1:in_video_url [2022-10-27 15:31:17,351][ INFO][ graph.cc:368 ] add link, yolo3_post:out_image -> videoencoder:in_video_frame [2022-10-27 15:31:17,353][ INFO][tensorflow_inference_common.cc:352 ] is_save_model: 0 [2022-10-27 15:31:17,353][ INFO][tensorflow_inference_common.cc:138 ] model path: /root/dectection_sedna/src/flowunit/helm_infer/model.pb [2022-10-27 15:31:17,354][ INFO][flowunit_group.cc:393 ] node: packed_planar_transpose4 get batch size is 8 [2022-10-27 15:31:17,356][ INFO][flowunit_group.cc:393 ] node: resize3 get batch size is 8 [2022-10-27 15:31:17,356][ INFO][flowunit_group.cc:393 ] node: normalize5 get batch size is 8 [2022-10-27 15:31:17,358][ INFO][flowunit_group.cc:393 ] node: video_decoder2 get batch size is 1 [2022-10-27 15:31:17,363][ INFO][flowunit_group.cc:393 ] node: video_demuxer1 get batch size is 1 [2022-10-27 15:31:17,362][ INFO][flowunit_group.cc:393 ] node: video_input0 get batch size is 8 [2022-10-27 15:31:17,364][ INFO][session_context.cc:40 ] session context start se id:c3d57c49-96b5-4725-9f88-240685d00050 [2022-10-27 15:31:17,371][ INFO][flowunit_group.cc:393 ] node: videoencoder get batch size is 1 2022-10-27 15:31:21.637629: E tensorflow/core/common_runtime/session_factory.cc:48] Two session factories are being registered underGRPC_SESSION [libprotobuf ERROR external/com_google_protobuf/src/google/protobuf/descriptor_database.cc:118] File already exists in database: tensorflow/core/data/service/common.proto [libprotobuf FATAL external/com_google_protobuf/src/google/protobuf/descriptor.cc:1379] CHECK failed: GeneratedDatabase()->Add(encoded_file_descriptor, size): [2022-10-27 15:31:21,639][ WARN][flowunit_group.cc:363 ] yolo3_post: open failed: code: Invalid argument, errmsg: import yolo3_post@Yolo3_postFlowUnit failed: CHECK failed: GeneratedDatabase()->Add(encoded_file_descriptor, size): [2022-10-27 15:31:21,640][ INFO][flowunit_group.cc:393 ] node: yolo3_post get batch size is 1 [2022-10-27 15:31:21,640][ERROR][ node.cc:384 ] open flowunit yolo3_post failed 2022-10-27 15:31:27.630367: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

pymumu commented 1 year ago

用的什么系统?

看上去是tensorflow版本没有配套。可能加载了两个tensorflow? 是不是用了python代码加载了模型?然后有用C加载了一次?

Ymh13383894400 commented 1 year ago

用Ubuntu系统,我把yolo3_post功能单元调用的代码注释之后,就出现下面logs [2022-10-27 16:07:17,004][ INFO][flowunit_group.cc:393 ] node: videoencoder get batch size is 1 [2022-10-27 16:07:17,811][ INFO][flowunit_group.cc:393 ] node: yolo3_post get batch size is 1 2022-10-27 16:07:20.323038: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2022-10-27 16:07:20.326108: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib/python3.7/dist-packages/cv2/../../lib64: 2022-10-27 16:07:20.326243: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303) 2022-10-27 16:07:20.326379: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (32a50441c14b): /proc/driver/nvidia/version does not exist [2022-10-27 16:07:20,329][ERROR][tensorflow_inference_common.cc:374 ] fill input failed, err: Fault, can't init op images:0 [2022-10-27 16:07:20,329][ERROR][tensorflow_inference_common.cc:409 ] init config failed: Fault, fill input failed, err: Fault, can't init op images:0 -> can't init op images:0 [2022-10-27 16:07:20,329][ WARN][flowunit_group.cc:363 ] helm_infer: open failed: code: Fault, errmsg: init config failed: Fault, fill input failed, err: Fault, can't init op images:0 -> can't init op images:0 [2022-10-27 16:07:20,329][ INFO][flowunit_group.cc:393 ] node: helm_infer6 get batch size is 8 [2022-10-27 16:07:20,329][ERROR][ node.cc:384 ] open flowunit helm_infer failed [2022-10-27 16:07:20,331][ERROR][ flow.cc:511 ] build graph failed, Fault, build graph failed, please check graph config. -> open flowunit 'helm_infer', type 'cpu' failed. -> init config failed: Fault, fill input failed, err: Fault, can't init op images:0 -> can't init op images:0 -> fill input failed, err: Fault, can't init op images:0 -> can't init op images:0 [2022-10-27 16:07:20,332][ERROR][ flow.cc:106 ] build flow failed, Fault, build graph failed, please check graph config. -> open flowunit 'helm_infer', type 'cpu' failed. -> init config failed: Fault, fill input failed, err: Fault, can't init op images:0 -> can't init op images:0 -> fill input failed, err: Fault, can't init op images:0 -> can't init op images:0 [2022-10-27 16:07:20,336][ INFO][ session.cc:107 ] session 95bd6f99-196a-416a-b859-9de757bdecd9 is over, running session count 0 [2022-10-27 16:07:20,336][ INFO][session_context.cc:44 ] session context finish se id:95bd6f99-196a-416a-b859-9de757bdecd9

pymumu commented 1 year ago

tensorflow的版本好像不配套。

最好直接用modelbox的镜像,避免环境问题。

Ymh13383894400 commented 1 year ago

发现自己导入了tensorflow。。。 再运行出现下面问题,是因为模型出错了吗 022-10-27 16:21:27.183801: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303) 2022-10-27 16:21:27.184133: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (32a50441c14b): /proc/driver/nvidia/version does not exist [2022-10-27 16:21:27,189][ERROR][tensorflow_inference_common.cc:374 ] fill input failed, err: Fault, can't init op images:0 [2022-10-27 16:21:27,189][ERROR][tensorflow_inference_common.cc:409 ] init config failed: Fault, fill input failed, err: Fault, can't init op images:0 -> can't init op images:0 [2022-10-27 16:21:27,189][ WARN][flowunit_group.cc:363 ] helm_infer: open failed: code: Fault, errmsg: init config failed: Fault, fill input failed, err: Fault, can't init op images:0 -> can't init op images:0 [2022-10-27 16:21:27,189][ INFO][flowunit_group.cc:393 ] node: helm_infer6 get batch size is 8 [2022-10-27 16:21:27,189][ERROR][ node.cc:384 ] open flowunit helm_infer failed [2022-10-27 16:21:27,193][ERROR][ flow.cc:511 ] build graph failed, Fault, build graph failed, please check graph config. -> open flowunit 'helm_infer', type 'cpu' failed. -> init config failed: Fault, fill input failed, err: Fault, can't init op images:0 -> can't init op images:0 -> fill input failed, err: Fault, can't init op images:0 -> can't init op images:0 [2022-10-27 16:21:27,193][ERROR][ flow.cc:106 ] build flow failed, Fault, build graph failed, please check graph config. -> open flowunit 'helm_infer', type 'cpu' failed. -> init config failed: Fault, fill input failed, err: Fault, can't init op images:0 -> can't init op images:0 -> fill input failed, err: Fault, can't init op images:0 -> can't init op images:0 [2022-10-27 16:21:27,202][ INFO][ session.cc:107 ] session 766f73f8-71aa-4635-bf37-9c5838207bea is over, running session count 0 [2022-10-27 16:21:27,202][ INFO][session_context.cc:44 ] session context finish se id:766f73f8-71aa-4635-bf37-9c5838207bea

pymumu commented 1 year ago

tensorflow初始化cuda失败了。你机器有Nvidia的GPU和cuda吗?

Ymh13383894400 commented 1 year ago

我用的是modelbox-tensorflow-ubuntu版本的镜像,里面自带的是tensorflow-gpu2.4的版本。我的机器没有gpu,所以我设置的所有功能单元的类型都是device=cpu deviceid="0"的,所以不知道是不是tensorflow的版本问题,是不是需要我重装tensorflow

pymumu commented 1 year ago

先检查下你的模型输入输出的端口名和toml文件中的配置是否正确。 可以发一下toml文件看看。

Ymh13383894400 commented 1 year ago

infer.toml

[base] name = "helm_infer" # The FlowUnit name device = "cpu" # The device the flowunit runs on,cpu,cuda,ascend version = "1.0.0" # The version of the flowunit description = "A flowunit for modelbox" # The description of the flowunit group_type = "inference" # flowunit group attribution stream = true # flowunit type type = "inference" # Fixed value entry = "model2.pb" # model file path virtual_type = "tensorflow" # inference engine type: 'tensorflow', 'tensorrt', 'torch', 'acl', 'mindspore'

Input ports description

[input] [input.input1] device = "cpu" name = "image_data" type = "float"

Output ports description

[output] [output.output1] name = "classes" type = "float"

[output.output2] name = "scores" type = "float"

[output.output3] name = "boxes" type = "float"

graph.toml

[driver] skip-default = false dir=[ "/root/dectection_sedna/src/flowunit" ] [profile] profile=false trace=false dir="" [graph] format = "graphviz" graphconf = '''digraph graph_dectection_sedna { video_input0 [ type=flowunit flowunit=video_input device=cpu deviceid="0" source_url="/root/dectection_sedna/src/video.mp4" ] video_demuxer1 [ type=flowunit flowunit=video_demuxer device=cpu deviceid="0" ] video_decoder2 [ type=flowunit flowunit=video_decoder device=cpu deviceid="0" pix_fmt=rgb ] resize3 [ type=flowunit flowunit=resize device=cpu deviceid="0" image_height="416" image_width="416" interpolation=inter_linear ] packed_planar_transpose4 [ type=flowunit flowunit=packed_planar_transpose device=cpu deviceid="0" ] normalize5 [ type=flowunit flowunit=normalize device=cpu deviceid="0" standard_deviation_inverse="0.003921568627451,0.003921568627451,0.003921568627451" ] helm_infer6 [ type=flowunit flowunit=helm_infer device=cpu deviceid="0"] yolo3_post [type=flowunit, flowunit=yolo3_post, device=cpu, deviceid="0" ] videoencoder [type=flowunit, flowunit=video_encoder, device=cpu, deviceid="0", encoder=mpeg4, format=mp4, default_dest_url="/root/dectection_sedna/tmp/video_tmp.mp4"]

video_input0:"out_video_url" -> video_demuxer1:"in_video_url"
video_demuxer1:"out_video_packet" -> video_decoder2:"in_video_packet"
video_decoder2:"out_video_frame" -> resize3:"in_image"
resize3:"out_image" -> packed_planar_transpose4:"in_image"
packed_planar_transpose4:"out_image" -> normalize5:"in_data"
#normalize5:"out_data" -> helm_infer6:"input_image_shape"
normalize5:"out_data" -> helm_infer6:"image_data"
video_decoder2:"out_video_frame" -> yolo3_post:"in_image"
helm_infer6:"classes" -> yolo3_post:"in_classes"
helm_infer6:"scores" -> yolo3_post:"in_scores"
helm_infer6:"boxes" -> yolo3_post:"in_boxes"
yolo3_post:"out_image" -> videoencoder:"in_video_frame"

}'''

pymumu commented 1 year ago
[input]
[input.input1]
device = "cpu"
name = "image_data"
type = "float"

这个改成

name = "image"

这个名称光看你的log,image可能是错误的。你要看你模型里面输入名称是什么,然后填写到 两个toml文件中。

Ymh13383894400 commented 1 year ago

改成image,报错误 [2022-10-27 16:21:27,189][ERROR][tensorflow_inference_common.cc:374 ] fill input failed, err: Fault, can't init op image:0 image_data,报错误 [2022-10-27 16:21:27,189][ERROR][tensorflow_inference_common.cc:374 ] fill input failed, err: Fault, can't init op image_data:0

pymumu commented 1 year ago

要写成你模型文件里面输入的名称。