Closed ekagra-ranjan closed 2 months ago
It is defined here https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/all_models/inflight_batcher_llm/tensorrt_llm/config.pbtxt#L28
For python backend, it would be like this one https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/all_models/inflight_batcher_llm/postprocessing/config.pbtxt#L28
Hi,
The inflight_batcher_llm directory contains the C++ implementation of the backend supporting inflight batching, paged attention and more.
Where is the logic in Triton which makes it change the codepath from cpp to python based on the
backend
key inconfig.pbtxt
being "tensorrtllm" or "python"?I searched for the term "tensorrtllm" in backend too but didnt find anything