Bring back IFB default to TRT LLM models and bump to 24.01

triton-inference-server / triton_cli

Triton CLI is an open source command line interface that enables users to create, deploy, and profile models served by the Triton Inference Server.

48 stars 2 forks source link

Some of our server messages are logged at verbose level 2 (AFAIK nothing higher than that). Is there any value in bumping the default level to 2 since the user is explicitly asking for verbose logging?

Great question! Totally possible to update in the future if needed, but personally I find 2+ to be a bit overkill for most cases. It currently will print constantly when using any sequence batching models, even if no inference is going on, for example - and goes a bit crazy when caching is enabled too. Happy to update later if needed.

triton-inference-server / triton_cli

Bring back IFB default to TRT LLM models and bump to 24.01 #31