Closed hackassin closed 2 months ago
It is not supported yet. We are working on it. There will also be a standard solution to integrate TRT-LLM with the Python backend of Triton soon.
It is not supported yet. We are working on it. There will also be a standard solution to integrate TRT-LLM with the Python backend of Triton soon.
Could you please tell me how long it will be online? I am looking forward to it. Thanks
Just to be clear, you can deploy the c++ trt-llm backend (with in-flight batching capabilities) in Triton, and then use a Python client to send requests to it. Example clients can be found in https://github.com/triton-inference-server/tensorrtllm_backend/tree/main/inflight_batcher_llm/client for example.
We have also added a Python BLS backend that can be used to implement more complex logic when orchestrating the preprocessor, the trt-llm backend and the postprocessor. See https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/all_models/inflight_batcher_llm/tensorrt_llm_bls/1/model.py
If you are asking about a python backend with the same functionality as the C++ trt-llm backend in https://github.com/triton-inference-server/tensorrtllm_backend/tree/main/all_models/inflight_batcher_llm/tensorrt_llm, this is still under development. We are still actively working on it but cannot commit to a date yet.
HI, Any update on the python backend? It could be a blocker for the adoption of some json formatter package.
Python backend for inflight batching can be found here: https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/all_models/inflight_batcher_llm/tensorrt_llm/1/model.py
It uses the Python bindings to the C++ executor API.
Hi Team,
Any updates on Inflight Batching support with Triton via Python client?
Thanks!