TensorRT-LLM Triton Backend Support

triton-inference-server / model_navigator

Triton Model Navigator is an inference toolkit designed for optimizing and deploying Deep Learning models with a focus on NVIDIA GPUs.

https://triton-inference-server.github.io/model_navigator/

Apache License 2.0

176 stars 24 forks source link

TensorRT-LLM Triton Backend Support #33

Open shixianc opened 10 months ago

shixianc commented 10 months ago

When can NAV support creating Triton Repo for this new backend? Is it on your roadmap? https://github.com/triton-inference-server/tensorrtllm_backend

jkosek commented 9 months ago

@shixianc thanks for feature request. We are going to review the backend options and add the support in next release.

If there are any specific requirements you see, let us know. Thanks!

ishandhanani commented 5 months ago

Hi team! Was this ever added? I'm looking through the release notes but cannot find support for TRT-LLM

jkosek commented 5 months ago

Hi @ishandhanani. Apologize, not yet. Let us prioritize this feature and provide some ETA.

jkosek commented 5 months ago

@ishandhanani maybe some questions to clarify expected behavior. Do you see this feature as generating the model store for tensorrtllm backend only (example) or you would expect that whole deployment of pre/post processing with BLS would be created (similar to this example)?

ishandhanani commented 5 months ago

I think a good first step would be to have it generate the model repo for the trtllm backend only. In the future it would be great if we could generate the entire pre/post processing model repo @jkosek

jkosek commented 1 month ago

@ishandhanani you may want to review the newly added TensorRTLLMModelConfig class that specify the TensorRT-LLM backend configuration: https://triton-inference-server.github.io/model_navigator/0.11.0/inference_deployment/triton/api/specialized_configs/#model_navigator.triton.TensorRTLLMModelConfig