triton-inference-server / triton_cli

Triton CLI is an open source command line interface that enables users to create, deploy, and profile models served by the Triton Inference Server.
48 stars 2 forks source link

Stream LLM #86

Open EgyipTomi425 opened 6 days ago

EgyipTomi425 commented 6 days ago

Hi. I use Llama 3, and I'd like to stream the output. I mean, it should be somehow with the 8001 port API. I'd like generate few tokens and send it to client time by time. Is it possible? It could help me a lot. Have a good day.