Open kazyun opened 1 month ago
This issue only occurs when using a streaming request. payload = { "text_input": QWEN_PROMPT_TEMPLATE.format(input_text=prompt), "max_tokens": max_tokens, "stream": True, }
response = requests.post(server_url, json=payload, stream=True)
This issue only occurs when using a streaming request. payload = { "text_input": QWEN_PROMPT_TEMPLATE.format(input_text=prompt), "max_tokens": max_tokens, "stream": True, }
response = requests.post(server_url, json=payload, stream=True)
Hi, I have the same problem. Is there any solution?
System Info
When using Qwen2, executing inference with the engine through the run.py script outputs normally. However, when using Triton for inference, some characters appear garbled, and the output is incomplete compared to the results obtained from using the script. What could be the cause of this issue?
maybe the config.pbtxt cause the problem
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Expected behavior
get the same results with run.py script
actual behavior
When using Qwen2, executing inference with the engine through the run.py script outputs normally. However, when using Triton for inference, some characters appear garbled, and the output is incomplete compared to the results obtained from using the script. What could be the cause of this issue?
additional notes
no