triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
7.97k stars 1.44k forks source link

P2PNet converted to onnx return bad output when used on triton server #6311

Closed nardaweissman closed 11 months ago

nardaweissman commented 11 months ago

Description I Trained model by pytorch from GitHub - TencentYoutuResearch/CrowdCounting-P2PNet: The official codes for the ICCV2021 Oral presentation "Rethinking Counting and Localization in Crowds: A Purely Point-Based Framework" model. inference on trained model look Ok. train and infer pytorch version 1.14.0a0+410ce96 - train and infer done on docker that is based on :nvcr.io/nvidia/pytorch:22.12-py3 I converted the model to onnx. onnxruntime version 1.15.1 The resulting onnx model - gives good result in the above environment using onnxruntime version 1.15.1. I configured the model to triton server - the triton server loaded the model successfully. the triton server is based on nvcr.io/nvidia/tritonserver:23.08-py3 When I try to do inference on that model I get a result in the callback but the result is not floating points vector as expected but bytes arrays. I uploaded the onnx model the config.pbtxt and the python code I use to test the triton inference server.

Triton Information What version of Triton are you using? nvcr.io/nvidia/tritonserver:23.08-py3

Are you using the Triton container, or did you build it yourself? I built it from triton container.

To Reproduce The triton docker I created, and use is at: https://hub.docker.com/layers/shayweissman1964/myrepo/tritonserver_seetrain/images/sha256-93ad95a2049eaf7b28010f3a29a7f38fd14aae1367edf5da4aa313ef9b1c4d42?context=repo

Steps to reproduce the behavior. I uploaded the directory tree where I work to :https://o365questsolution.sharepoint.com/:f:/s/IsraelRD/EtUhL8l8PJRLtlorcj72tuEB_WqdCyc2KUOdp28Zp7n7Ow?e=FqUEmH The docker is run by IP/NVIDIA_ENGINE_SEETRAIN/docker/triton/launch.sh -r pc. The triton loades 2 models from the IP/NVIDIA_ENGINE_SEETRAIN/triton/model_repo (It load successfully for me). To test it I use IP/NVIDIA_ENGINE_SEETRAIN/triton/tests/p2p_unit_test.py The infer request pass successfully and I get a response to the call back routine do_nothing. The problem is that when looking on the result structure I find 2 strange strings instead of 2 fp32 vectors of the size [ 1, 49152 , 2]

Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well). train and infer pytorch version 1.14.0a0+410ce96 - train and infer done on docker that is based on :nvcr.io/nvidia/pytorch:22.12-py3 The model gives reasonable results on the docker above as .pt file and as .onnx file. inference with onnxruntime is also good, the config.pbtxt is at IP/NVIDIA_ENGINE_SEETRAIN/triton/model_repo/p2p

Expected behavior A clear and concise description of what you expected to happen. I expect that result that arive to do_nothing will contain the 2 expected output vectors.

nardaweissman commented 11 months ago

according to recommendations from NVidia I tested with onnx.check_model routine - succeeded. I tested with trtexec as well also succeeded.

nardaweissman commented 11 months ago

I can provide all my environment files but many are not supported here like .py .onnx ... what to do?

nardaweissman commented 11 months ago

I pushed a tritonclient docker that replicate my issue. it runs the p2p_unit_test.py that I uploaded to :https://o365questsolution.sharepoint.com/:f:/s/IsraelRD/EtUhL8l8PJRLtlorcj72tuEB_WqdCyc2KUOdp28Zp7n7Ow?e=FqUEmH https://hub.docker.com/layers/shayweissman1964/myrepo/tritonclient_seetrain/images/sha256-7824af90285b63767f391a45e1565e567062a709d90879a7951bd1c020a84c36?context=repo

nardaweissman commented 11 months ago

Issue solved