Open yunfeng-scale opened 1 year ago
Hi, during request streaming it'll be helpful to have a flag to indicate end of generation. Can you help with this feature request?
I believe that means returning the bool flag from https://github.com/triton-inference-server/tensorrtllm_backend/blob/release/0.5.0/inflight_batcher_llm/src/libtensorrtllm.cc#L892
Ref: https://github.com/NVIDIA/TensorRT-LLM/issues/240
Thanks for making this issue, @yunfeng-scale ! We'll get back to you once we have a chance to look into this. & we're always happy support community MRs too!
Also need this feature!
Hi, during request streaming it'll be helpful to have a flag to indicate end of generation. Can you help with this feature request?
I believe that means returning the bool flag from https://github.com/triton-inference-server/tensorrtllm_backend/blob/release/0.5.0/inflight_batcher_llm/src/libtensorrtllm.cc#L892
Ref: https://github.com/NVIDIA/TensorRT-LLM/issues/240