triton-inference-server / tensorrtllm_backend

The Triton TensorRT-LLM Backend
Apache License 2.0
665 stars 96 forks source link

Invalid argument: unable to find backend library for backend '${triton_backend}' #526

Open chenchunhui97 opened 2 months ago

chenchunhui97 commented 2 months ago

System Info

Who can help?

@byshiue @sc

Information

Tasks

Reproduction

Model name: Qwen1.5-14b-Chat

  1. generate engine follwing steps in readme of TensorRT-LLM . successed.
  2. launch service using triton. failed.

Expected behavior

launch the service successfully.

actual behavior

image

additional notes

image

alemantus commented 2 months ago

I get the exact same error using the tritonserver:24.05-trtllm-python-py3 container on a A100.

here4dadata commented 2 months ago

Set triton_backend to 'tensorrtllm' in the config.pbtxt for tensorrt_llm and it should work.

I think this was introduced because there is now a model.py file in tensorrt_llm/1 as of v0.10.0, but I have not come across anything explaining why this file is here along with what purpose it serves vs tensorrt_llm_bls.

Maybe someone could point us in the right direction regarding the need for this new parameter, and the new model.py file

byshiue commented 2 months ago

Thank you for the comments, @here4dadata . Your comment is correct. Some additional comments: the model.py is the python backend to use the tensorrt_llm. (In comparison, if you set triton_backend to tensorrtllm, it would be c++ triton backend).