triton-inference-server / tensorrtllm_backend

The Triton TensorRT-LLM Backend
Apache License 2.0
711 stars 108 forks source link

Support non-detached mode for python trtllm backend #639

Open ShuaiShao93 opened 2 weeks ago

ShuaiShao93 commented 2 weeks ago

System Info

tensorrtllm backend doesn't work for us because of this bug: https://github.com/triton-inference-server/tensorrtllm_backend/issues/598. So I have to use python backend. However, it only supports detached model which we don't need.

Can we add support for non-detached mode?

Who can help?

@ncomly-nvidia

Information

Tasks

Reproduction

N/A

Expected behavior

N/A

actual behavior

N/A

additional notes

N/A