Key 'lora_config' not found

System Info

I'm using Ubuntu 22.04 and 8x NVIDIA H100s

Who can help?

No response

Information

[ ] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[X] My own task or dataset (give details below)

Reproduction

Follow this blog, but using Meta's Llama 3 70B Instruct model and accounting for using 8 GPUs.

Expected behavior

I went through the blog about two weeks ago using Meta's Llama 3 70B model (again, accounting for the fact that I was using a different model and 8 GPUs), and it ended fine; I was able to host and query the server. I'd expect the same thing to happen with the Instruct model.

actual behavior

Instead, I'm getting the following message when I try to run launch_triton_server.py:

backend_model.cc:691] ERROR: Failed to create instance: unexpected error when creating modelInstanceState: [json.exception.out_of_range.403] key 'lora_config' not found

I'm not really sure why I'm getting this error message now; where should this lora_config be located? Why am I getting different behavior compared to the regular non-Instruct model?

additional notes

I've gone in circles trying to get this to work for Instruct (have not tried the non-Instruct since the two weeks ago, maybe it won't work now either). I tried to do it on my own without following the blog, but then I kept getting errors, only to realize that the current TensorRT-LLM and tensorrtllm_backend version are incompatible. And now, even after following the versions suggested by the blog, I still can't get things to work.

triton-inference-server / tensorrtllm_backend