Closed BiboyQG closed 4 months ago
Please tell me, how did you achieve streaming output before? Isn't your code returning the final result directly? :results_iter=engine.generate(sampling_params=sampling_params, intpus=prompt_tokens, request_id=request_id)
Close this as I solved the issue.
Your current environment
How would you like to use vllm
I wrote a script that can return the output in a streaming way. When I use 0.4.2, vLLM works perfectly with my code, but when I update it to 0.4.3, and change one line of code to:
the outputs show:
which doesn't appear to be a problem at all when I use 0.4.2. So I wonder where the differences are and how I can modify my code (which are mostly referenced from here)to make it compatible with version 0.4.3. Thanks in advance for any possible help!