Closed jasonngap1 closed 3 months ago
Could you share the docker image you use? It looks the server does not find tensorrt_llm successfully.
Hi, I managed to solve the issue by installing tensorrt-llm using pip, instead of building from source. This issue can be closed.
Hi, I am trying to deploy a mistral-7b-instruct model on the triton server, but have met with difficulties. I have successfully converted my Mistral model using
trtllm-build
in the llama example in the TensorRT-LLM repo but I am not sure how to deploy on the Triton Server. There seem to be many ways to do so and I have tried creating a tensorrt_llm backend and an ensemble backend but both does not work. Is it possible to advice on what I should do? I would like to create an endpoint such that I can pass a prompt to the mistral model on the Triton server to return generated text.Here are the steps I have done: After pulling the Mistral model weights, I have converted the raw model weights into tensorrt-llm checkpoint format
I have built the engine needed (this returns me with a config.json and rank0.engine file):
I went on to pull the latest triton server version 24.02 and tried to deploy the tensorrt-llm model but have met with the error:
UNAVAILABLE: Invalid argument: unable to find backend library for backend 'tensorrtllm', try specifying runtime on the model configuration