isarsoft / yolov4-triton-tensorrt

This repository deploys YOLOv4 as an optimized TensorRT engine to Triton Inference Server
http://www.isarsoft.com
Other
277 stars 63 forks source link

Python inference returns 'shape': [1, 1] #4

Closed PyCod closed 3 years ago

PyCod commented 3 years ago

Cool repo, thanks for sharing! I got yolov4 working great on triton thanks to you.

I'm interested in writing a python inference client (and share it back to the repo once it's done). I started off with the image_client.py example script from official nvidia repo, and after a little tweaking I got stuck.

Everything seems to work just fine, I send an image to the triton engine using the http protocol and get a response back, but things go wrong in postprocessing.

Below is the logs of the triton engine when I send a request with the python client:

I0918 13:17:27.860435 1 http_server.cc:1185] HTTP request: 0 /v2/models/yolov4
I0918 13:17:27.860469 1 model_repository_manager.cc:608] GetInferenceBackend() 'yolov4' version -1
I0918 13:17:27.860480 1 model_repository_manager.cc:564] VersionStates() 'yolov4'
I0918 13:17:27.860726 1 http_server.cc:1185] HTTP request: 0 /v2/models/yolov4/config
I0918 13:17:27.860736 1 model_repository_manager.cc:608] GetInferenceBackend() 'yolov4' version -1
I0918 13:17:28.721807 1 http_server.cc:1185] HTTP request: 2 /v2/models/yolov4/infer
I0918 13:17:28.721850 1 model_repository_manager.cc:608] GetInferenceBackend() 'yolov4' version -1
I0918 13:17:28.768381 1 infer_request.cc:347] add original input: [0x0x7f987c005ac0] request id: 1, model: yolov4, requested version: -1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 0, priority: 0, timeout (us): 0
original inputs:
[0x0x7f97a94fdbd8] input: data, type: FP32, original shape: [1,3,608,608], shape: []
override inputs:
inputs:
original requested outputs:
requested outputs:

I0918 13:17:28.908998 1 infer_request.cc:480] prepared: [0x0x7f987c005ac0] request id: 1, model: yolov4, requested version: -1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 1, priority: 0, timeout (us): 0
original inputs:
[0x0x7f97a94fdbd8] input: data, type: FP32, original shape: [1,3,608,608], shape: [3,608,608]
override inputs:
inputs:
[0x0x7f97a94fdbd8] input: data, type: FP32, original shape: [1,3,608,608], shape: [3,608,608]
original requested outputs:
prob
requested outputs:
prob

I0918 13:17:28.909074 1 plan_backend.cc:1634] Running yolov4_0_gpu0 with 1 requests
I0918 13:17:28.909120 1 plan_backend.cc:2355] Optimization profile default [0] is selected for yolov4_0_gpu0
I0918 13:17:28.909147 1 pinned_memory_manager.cc:130] pinned memory allocation: size 4435968, addr 0x7f994a000090
I0918 13:17:28.914680 1 plan_backend.cc:1894] Context with profile default [0] is being executed for yolov4_0_gpu0
I0918 13:17:28.917198 1 infer_response.cc:74] add response output: output: prob, type: FP32, shape: [1,7001,1,1]
I0918 13:17:28.917220 1 http_server.cc:1136] HTTP: unable to provide 'prob' in GPU, will use CPU
I0918 13:17:28.917232 1 http_server.cc:1156] HTTP using buffer for: 'prob', size: 28004, addr: 0x7f97f3f9bf00
I0918 13:17:28.917240 1 pinned_memory_manager.cc:130] pinned memory allocation: size 28004, addr 0x7f994a43b0a0
I0918 13:17:28.917267 1 pinned_memory_manager.cc:157] pinned memory deallocation: addr 0x7f994a000090
I0918 13:17:28.923462 1 pinned_memory_manager.cc:157] pinned memory deallocation: addr 0x7f994a43b0a0
I0918 13:17:28.923628 1 http_server.cc:1171] HTTP release: size 28004, addr 0x7f97f3f9bf00

We can see that the output layer prob is supposed to have shape [1,7001,1,1]. But in the python client I only recieve a [1, 1] shape:

{'id': '1', 'model_name': 'yolov4', 'model_version': '1', 'outputs': [{'name': 'prob', 'datatype': 'BYTES', 'shape': [1, 1], 'data': ['633.535400:3']}]}

It has probably something to do with it being interpreted as BYTES maybe? If I crack this and find a working python script, I'll PR it back to this repo :)

Thanks!

philipp-schmidt commented 3 years ago

There is an implementation in this repo here: https://github.com/penolove/yolov4_triton_client/blob/master/simple_grpc_infer_client.py

I will include something similar in this repo soon, but maybe this helps you spot the difference and the issue until then.

philipp-schmidt commented 3 years ago

Stay tuned for a python client in this repo, hopefully with support for video etc.

philipp-schmidt commented 3 years ago

@PyCod this repo now has a full client implementation under clients/python/

If you have trouble with the new client create a new issue, I will close this one here.