Closed LouisCastricato closed 3 months ago
Hi, Thank you for your feature request.
Current TensorRT-LLM team's guidance is to use using the TensorRT-LLM Backend to run models on Triton.
We're currently developing an example to demonstrate integrating TensorRT-LLM with PyTriton, which should help clarify this process.
This issue is stale because it has been open 21 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue was closed because it has been stalled for 7 days with no activity.
TensorRT LLM support is possible from release 0.5.0. The example was create to showcase PyTriton usage with NVIDIA TensoRT LLM.
Is your feature request related to a problem? Please describe.
I can't seem to find any examples of how to distribute models that are built for tensorrt-llm. Is this a possible thing and I am missing documentation for it?
Describe the solution you'd like
Either improve documentation on how to utilize pytriton with tensorrt-llm or explain why such a combination is non-desirable or ill-formed
Describe alternatives you've considered
I've looked at the OPT-Jax example and have begun experimenting with using a Jax port of LLaMA 2 with that example.