triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.03k stars 1.44k forks source link

triton need api docs like vllm fastapi docs #7518

Open kinglion811 opened 1 month ago

kinglion811 commented 1 month ago

Recently, I have been using Triton to deploy my model service and found that Triton only provides an SDK, but there is no API documentation. I need to understand the code in the SDK client to find the API, which is very unfriendly. I hope it can provide API documentation like VLLM.

image
balusch commented 3 weeks ago

Same here. It would be better if triton-server could provide SDK in addition to a standalone server, so I can integrate it with other HTTP servers like FastApi to serve HTTP requests.