Closed rmccorm4 closed 1 month ago
CC @kthui changing the default to KIND_MODEL here may resolve the test failures when landing on multi-gpu nodes, and will generally be more friendly towards users customizing their TP/PP settings > 1 after generating a model as a starting point.
Add Llama3.1-8B support for vLLM (not TRT-LLM yet), and use KIND_MODEL by default on vllm generated
config.pbtxt
for multi-gpu issues.Note Llama3.1 support requires a higher vLLM version than what is currently in the 24.07 release.
pip install "vllm==0.5.3.post1"
in the 24.07 vLLM container worked though.