triton-inference-server / tensorrtllm_backend

The Triton TensorRT-LLM Backend
Apache License 2.0
581 stars 81 forks source link

How does Triton know which codepath to choose based on `backend` in config.pbtxt being "tensorrtllm" or "python" #420

Closed ekagra-ranjan closed 2 months ago

ekagra-ranjan commented 2 months ago

Hi,

The inflight_batcher_llm directory contains the C++ implementation of the backend supporting inflight batching, paged attention and more.

Where is the logic in Triton which makes it change the codepath from cpp to python based on the backend key in config.pbtxt being "tensorrtllm" or "python"?

I searched for the term "tensorrtllm" in backend too but didnt find anything

byshiue commented 2 months ago

It is defined here https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/all_models/inflight_batcher_llm/tensorrt_llm/config.pbtxt#L28

For python backend, it would be like this one https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/all_models/inflight_batcher_llm/postprocessing/config.pbtxt#L28