triton-inference-server / triton_cli

Triton CLI is an open source command line interface that enables users to create, deploy, and profile models served by the Triton Inference Server.
48 stars 2 forks source link

Fix model infer on TRT LLM with negative ints, and minor cleanup #28

Closed rmccorm4 closed 8 months ago

rmccorm4 commented 8 months ago

TRT LLM has TYPE_INT32 params like max_tokens that can;t be negative, but client script was sending random ints that could be negative. Updated to clamp the values to (0, 128) to try to avoid similar future issues, and added some handling for the TRT LLM args in llm data generator.

Moved verbose to top level arg to be accessible by all subcommands (note: needs triton -v subcommand ...

Tweak versions and order of installs in Dockerfile. I think this fixes accelerate/mistral issues, but didn't check too hard.

triton repo add -m llama-2-7b --backend tensorrtllm
triton server start
triton model infer -m llama-2-7b --prompt "Convert the number 71 into words:"