TRT LLM has TYPE_INT32 params like max_tokens that can;t be negative, but client script was sending random ints that could be negative. Updated to clamp the values to (0, 128) to try to avoid similar future issues, and added some handling for the TRT LLM args in llm data generator.
Moved verbose to top level arg to be accessible by all subcommands (note: needs triton -v subcommand ...
Tweak versions and order of installs in Dockerfile. I think this fixes accelerate/mistral issues, but didn't check too hard.
triton repo add -m llama-2-7b --backend tensorrtllm
triton server start
triton model infer -m llama-2-7b --prompt "Convert the number 71 into words:"
TRT LLM has TYPE_INT32 params like
max_tokens
that can;t be negative, but client script was sending random ints that could be negative. Updated to clamp the values to (0, 128) to try to avoid similar future issues, and added some handling for the TRT LLM args in llm data generator.Moved verbose to top level arg to be accessible by all subcommands (note: needs
triton -v subcommand ...
Tweak versions and order of installs in Dockerfile. I think this fixes accelerate/mistral issues, but didn't check too hard.