triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.27k stars 1.47k forks source link

Can TIS run both vllm and torch backend together? #7637

Closed k0286 closed 1 month ago

k0286 commented 1 month ago

Is your feature request related to a problem? Please describe. I have some torch models on TIS, now I want to add a LLM model.

And I notice that TIS supports the vllm. But there are no Triton images on NGC can support both vllm and torch backend.

Describe the solution you'd like Provide the Triton image support both vllm and torch backend.

Describe alternatives you've considered Describe the reason why can not support vllm and torch backend at same time.

tanmayv25 commented 1 month ago

@k0286 Ideally you should be able to build container image with vLLM and Torch backend.

Triton supports numerous backends which can lead to a large number of combinations - container images. Additionally, vLLM dependency is large in size so in order to avoid further increasing the image size vLLM container image does not carry any other backends.

However, for your use case, you can start with Triton container image with pytorch backend and install vLLM backend in it. See the instructions here on how to build this image.

k0286 commented 1 month ago

Ty, I'll give it a try!