I need to be able to deploy tuned LoRA LLM models with Triton server and its tensorrt_llm backend. I have built the engines with TensorRT-LLM in this way
I have tried to resolve this error: [TensorRT-LLM][ERROR] 6: The engine plan file is not compatible with this version of TensorRT, expecting library version 9.2.0.5 got 9.3.0.1, please rebuild. using older versions of TensorRT-LLM (v0.8.0 and v0.7.1) but the configuration files and the generated artifacts (lora folder) do not reflect the LoRA artifact as they do since version v0.9.0:
I'm new to deploying tuned LLM models with LoRA, am I doing something wrong?
Has anyone been able to deploy these models with Triton server and made them available through an endpoint?
TensorRT-LLM: v0.9.0.dev2024040900 tensorrt_llm : 24.03-vllm-python-py3
I need to be able to deploy tuned LoRA LLM models with Triton server and its tensorrt_llm backend. I have built the engines with TensorRT-LLM in this way
which leaves me with the following artifacts:
rank0.engine config.json lora/0/adapter_config.json lora/0/adapter_model.bin
To infer I use python:
From here I want to know how to deploy these artifacts through Triton server. I tried using the tensorrt_llm backend but got this error:
I have tried to resolve this error:
[TensorRT-LLM][ERROR] 6: The engine plan file is not compatible with this version of TensorRT, expecting library version 9.2.0.5 got 9.3.0.1, please rebuild.
using older versions of TensorRT-LLM (v0.8.0 and v0.7.1) but the configuration files and the generated artifacts (lora folder) do not reflect the LoRA artifact as they do since version v0.9.0:I'm new to deploying tuned LLM models with LoRA, am I doing something wrong? Has anyone been able to deploy these models with Triton server and made them available through an endpoint?