Closed mongodben closed 1 year ago
instead of waiting for LLM to generate response and returning to client in 1 shot, stream the response.
for example of LLM streaming, see https://platform.openai.com/docs/api-reference/completions/create#completions/create-stream
this presumably requires both client and server work to set up both sides of the streaming.
tutorial video using streaming that i mentioned in meeting https://www.youtube.com/watch?v=dXsZp39L2Jk
instead of waiting for LLM to generate response and returning to client in 1 shot, stream the response.
for example of LLM streaming, see https://platform.openai.com/docs/api-reference/completions/create#completions/create-stream