triton-inference-server / tensorrtllm_backend

The Triton TensorRT-LLM Backend
Apache License 2.0
664 stars 96 forks source link

Feature request: Flag indicate end of stream #87

Open yunfeng-scale opened 11 months ago

yunfeng-scale commented 11 months ago

Hi, during request streaming it'll be helpful to have a flag to indicate end of generation. Can you help with this feature request?

I believe that means returning the bool flag from https://github.com/triton-inference-server/tensorrtllm_backend/blob/release/0.5.0/inflight_batcher_llm/src/libtensorrtllm.cc#L892

Ref: https://github.com/NVIDIA/TensorRT-LLM/issues/240

ncomly-nvidia commented 10 months ago

Thanks for making this issue, @yunfeng-scale ! We'll get back to you once we have a chance to look into this. & we're always happy support community MRs too!

Chevolier commented 7 months ago

Also need this feature!