triton-inference-server / tensorrtllm_backend

The Triton TensorRT-LLM Backend
Apache License 2.0
664 stars 96 forks source link

How is GptManager used in Triton backend? #421

Open ekagra-ranjan opened 5 months ago

ekagra-ranjan commented 5 months ago

I see that Triton backend creates an object of GptManager which gets passed the engine dir. However, I unable to see any code that shows how this GptManager is being called. All I can see is backend calling some Triton function but the GptManager is not a function arg to those calls so I am curious how is the engine being called from Triton backend.

Can I please get some pointers to the code which does this?

Thanks!

byshiue commented 5 months ago

gptManager is defined in this header file https://github.com/triton-inference-server/tensorrtllm_backend/blob/bf5e9007a3f16c7fc76cb156a3362d1caae198dd/inflight_batcher_llm/src/model_instance_state.h#L39, but the implementation is not opened.