Open chenchunhui97 opened 2 months ago
I get the exact same error using the tritonserver:24.05-trtllm-python-py3 container on a A100.
Set triton_backend to 'tensorrtllm' in the config.pbtxt for tensorrt_llm and it should work.
I think this was introduced because there is now a model.py file in tensorrt_llm/1
as of v0.10.0, but I have not come across anything explaining why this file is here along with what purpose it serves vs tensorrt_llm_bls.
Maybe someone could point us in the right direction regarding the need for this new parameter, and the new model.py
file
Thank you for the comments, @here4dadata . Your comment is correct.
Some additional comments: the model.py is the python backend to use the tensorrt_llm. (In comparison, if you set triton_backend
to tensorrtllm
, it would be c++ triton backend).
System Info
Who can help?
@byshiue @sc
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Model name: Qwen1.5-14b-Chat
Expected behavior
launch the service successfully.
actual behavior
additional notes