how to deploy multimodal model like llava on triton server? tensorrtllm_backend does not support multimodal

triton-inference-server / tensorrtllm_backend

The Triton TensorRT-LLM Backend

Apache License 2.0

581 stars 81 forks source link

Closed lss15151161 closed 5 days ago

byshiue commented 5 days ago

The multimodal example on backend is on going because the backend of multimodal is very different to pure decoder (like GPU), as mentioned in https://github.com/triton-inference-server/tensorrtllm_backend/issues/463. Close this issue to prevent duplication.