Open ShuaiShao93 opened 2 weeks ago
tensorrtllm backend doesn't work for us because of this bug: https://github.com/triton-inference-server/tensorrtllm_backend/issues/598. So I have to use python backend. However, it only supports detached model which we don't need.
Can we add support for non-detached mode?
@ncomly-nvidia
examples
N/A
System Info
tensorrtllm backend doesn't work for us because of this bug: https://github.com/triton-inference-server/tensorrtllm_backend/issues/598. So I have to use python backend. However, it only supports detached model which we don't need.
Can we add support for non-detached mode?
Who can help?
@ncomly-nvidia
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
N/A
Expected behavior
N/A
actual behavior
N/A
additional notes
N/A