Open g-hano opened 3 months ago
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!
Your current environment
How would you like to use vllm
I am using the vllm API server with the following setup:
I am sending requests to the server using this Python function:
I want to display the streamed response on my Flask app's screen. The issue I'm encountering is with the structure of the streamed responses. The API server returns the response in a sequence of JSON objects like this:
On my Flask app, I want to print only the final text ("hello how are you?") on a single line, in a streaming fashion. I believe I can slice the "text" by SYSTEM_PROMPT, but I'm unsure how to do this correctly.
Here is the JavaScript code I am using to handle the streaming:
My Questions:
Any advice or guidance would be greatly appreciated!