[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[X] My own task or dataset (give details below)
Reproduction
Follow this blog, but using Meta's Llama 3 70B Instruct model and accounting for using 8 GPUs.
Expected behavior
I went through the blog about two weeks ago using Meta's Llama 3 70B model (again, accounting for the fact that I was using a different model and 8 GPUs), and it ended fine; I was able to host and query the server. I'd expect the same thing to happen with the Instruct model.
actual behavior
Instead, I'm getting the following message when I try to run launch_triton_server.py:
backend_model.cc:691] ERROR: Failed to create instance: unexpected error when creating modelInstanceState: [json.exception.out_of_range.403] key 'lora_config' not found
I'm not really sure why I'm getting this error message now; where should this lora_config be located? Why am I getting different behavior compared to the regular non-Instruct model?
additional notes
I've gone in circles trying to get this to work for Instruct (have not tried the non-Instruct since the two weeks ago, maybe it won't work now either). I tried to do it on my own without following the blog, but then I kept getting errors, only to realize that the current TensorRT-LLM and tensorrtllm_backend version are incompatible. And now, even after following the versions suggested by the blog, I still can't get things to work.
System Info
I'm using Ubuntu 22.04 and 8x NVIDIA H100s
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Follow this blog, but using Meta's Llama 3 70B Instruct model and accounting for using 8 GPUs.
Expected behavior
I went through the blog about two weeks ago using Meta's Llama 3 70B model (again, accounting for the fact that I was using a different model and 8 GPUs), and it ended fine; I was able to host and query the server. I'd expect the same thing to happen with the Instruct model.
actual behavior
Instead, I'm getting the following message when I try to run
launch_triton_server.py
:I'm not really sure why I'm getting this error message now; where should this
lora_config
be located? Why am I getting different behavior compared to the regular non-Instruct model?additional notes
I've gone in circles trying to get this to work for Instruct (have not tried the non-Instruct since the two weeks ago, maybe it won't work now either). I tried to do it on my own without following the blog, but then I kept getting errors, only to realize that the current TensorRT-LLM and tensorrtllm_backend version are incompatible. And now, even after following the versions suggested by the blog, I still can't get things to work.