Support non-detached mode for python trtllm backend - Githubissues

triton-inference-server / tensorrtllm_backend

The Triton TensorRT-LLM Backend

Apache License 2.0

711 stars 108 forks source link

Support non-detached mode for python trtllm backend #639

Open ShuaiShao93 opened 2 weeks ago

ShuaiShao93 commented 2 weeks ago

System Info

tensorrtllm backend doesn't work for us because of this bug: https://github.com/triton-inference-server/tensorrtllm_backend/issues/598. So I have to use python backend. However, it only supports detached model which we don't need.

Can we add support for non-detached mode?

Who can help?

@ncomly-nvidia

Information

[ ] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

N/A

Expected behavior

N/A

actual behavior

N/A

additional notes

N/A