Open saugatapaul1010 opened 4 months ago
CC @fpetrini15 @krishung5 if you're familar with support for in-process API on Windows
Hi @saugatapaul1010, we are planning to add the tritonserver.lib
file into the windows assets for 24.08. This will allow you to link against the Triton C API in your build.
Description Hi, I have setup Triton version 2.47 for Windows, along with ONNX runtime backend, based on the assets for Triton 2.47 that are mentioned in this URL : https://github.com/triton-inference-server/server/releases/
I have downloaded the ZIP file (tritonserver2.47.0-win.zip), and extracted it's contents on my local system.
Triton Information I am using Triton version 2.47, for Windows 10, Python 3.10.11, CUDA 12.4, TensorRT 10.0.1.6, 9.1.0.7, VCPKG and dependent C, C++ development SDKs.
Are you using the Triton container or did you build it yourself? I have downloaded the ZIP file for Windows (Triton 2.47) and extracted it's contents. I am using the steps mentioned in this URL : https://github.com/triton-inference-server/server/releases/tag/v2.47.0
To Reproduce Steps to reproduce the behavior.
First of all I had to spend a huge amount of time to set this thing up natively in Windows 10, because of lack of clear instructions in the readme file or anywhere, as a result of which I had to do a lot of trial and error until I randomly came across a solution, that is manually copying and pasting all the files from ``tritonserver2.47.0-win\tritonserver\backends\onnxruntime
to
models_repository\resnet50folder, where resnet50 folder contain my ONNX model. This was clearly not mentioned anywhere in the documentation. Then I am using this command
tritonserver --model-repository=F:/Triton_Latest/models_repository --backend-config=onnx,dir=F:/Triton_Latest/tools/tritonserver2.47.0-win/tritonserver/backends/onnxruntime,version=1.18.0``` to check if the ONNX runtime has been successfully setup. On successful setup of the triton server and onnx backend, I get this below outputs:So the GRPC and HTTPs services are up and running, however the issue is that we need to natively use the Resnet 50 model using C++ code, and get a response. So is there any C++ wrapper or functionality that is present? Cause this is not mentioned anywhere in the release notes for Triton 2.47 for Windows.
What I want to achieve is this functionality : Loading the triton inference server, loading the resnet 50 model in the server, and making use of this model to provide inference based on a C++ code natively on Windows 10. Now, currently it's mentioned that GRPC and HTTPs calls are supported. What about native runtime inference using C++ ?
Describe the models (framework, inputs, outputs), ideally include the model configuration file (if using an ensemble include the model configuration file for that as well).
Below is the model configuration file for reference for ResNet 50.
This is my CMakeLists.txt file :
This is my main cpp file (prod.cpp).
Expected behavior What changes do I need to make in either the settings, or the prod.cpp code, so as to make sure that I can make use of the ONNX model at runtime natively in Windows using C++ call, without GRPC or HTTP calls.