Open kazyun opened 6 days ago
Hi @kazyun , thanks for reporting. Could you please provide the reproducer, if possible?
Hi @kazyun , thanks for reporting. Could you please provide the reproducer, if possible?
Just by setting up streaming requests, sometimes the responses may contain individual garbled characters. In the screenshot above, the input prompt=”第五项修炼“, the issue can be resolved by setting bad_word=["1."], but other cases where some garbled characters in the response cannot be resolved."
Simply turn on accumulate_tokens resolves the garbled character issue, so I personally believe it should be related to the decoding capability of the tokens.
Description stream request return garbled code
Triton Information tritonserver 24:08 run container with this image: nvcr.io/nvidia/tritonserver:24.08-trtllm-python-py3
To Reproduce Steps to reproduce the behavior.
This issue only occurs when using a streaming request. v2/models/tensorrt_llm_bls/generate_stream (both ensemble) payload = { "text_input": QWEN_PROMPT_TEMPLATE.format(input_text=prompt), "max_tokens": max_tokens, "stream": True, }
The screenshot below shows the results of non-streaming and streaming requests.
Expected behavior same result with v2/models/tensorrt_llm_bls/generate