Closed nstogner closed 4 months ago
vLLM supports streaming with the openai compatible backend that we serve. I can't remember if I tested this already or not, but agree we should test and ensure it works. Ideally we write a system test case for it?
I tested this a while ago, it works :partying_face:
Lingo should work with streaming. We should check to see if it does today and if not, update lingo. Also requires validating that our example backends support streaming.
https://platform.openai.com/docs/api-reference/streaming