Support streaming response for structures

bjsi commented 1 year ago

OpenAI supports streaming function calls. Here's an example in Python: https://gist.github.com/simonmesmith/bbeb894fc4ae954b246125eb2902800b. It would be a nice feature to have in modelfusion.

bjsi commented 1 year ago

I was able to get something quick working by doing this:

import { OpenAIChatDelta, OpenAIChatModel } from "modelfusion";

export class StreamingToolModel extends OpenAIChatModel {
  functionCall:
    | {
        name: string;
        arguments?: string;
      }
    | undefined;
  extractTextDelta(fullDelta: OpenAIChatDelta): string | undefined {
    const msg = fullDelta[0];
    if (msg?.["delta"]?.["function_call"]) {
      if (!this.functionCall) {
        this.functionCall = {
          name: "",
          arguments: "",
        };
      }
      if ("name" in msg["delta"]["function_call"]) {
        this.functionCall["name"] += msg["delta"]["function_call"]["name"];
      }
      if (msg["delta"]["function_call"]["arguments"]) {
        this.functionCall["arguments"] +=
          msg["delta"]["function_call"]["arguments"];
      }
      return JSON.stringify(this.functionCall);
    } else {
      return undefined;
    }
  }
}

lgrammel commented 1 year ago

Very cool!

I thought a bit about it earlier, but didn't see any advantages for streaming over regular generation for JSON, because you need to wait for the JSON to be fully available before it can be parsed and typechecked.

I'm curious what your use case is and what advantages streaming has for it?

bjsi commented 1 year ago

I like using function calls for most things because I feel like I have more control over LLM output and the output is more reliable. If I'm generating a JSON array then I don't need to wait for the entire array to finish generating before I start processing some of the entries. My solution has been to use partial json parser to parse items out of the incomplete JSON :+1:

lgrammel commented 1 year ago

Ah, very cool. I'm leaning towards providing access to the raw stream responses as a first step - then you could extract the information from there. Going forward I'll think about how to integrate it into the JSON functionality.

lgrammel commented 1 year ago

@bjsi array streaming is totally useful - exploring this in more detail. Did you find any use cases for JSON streaming besides streaming top-level arrays item-by-item?

bjsi commented 1 year ago

I also use it for the fields of objects. Eg for a flashcard creating assistant in a note-taking app:

const flashcard = z.object({
  question: z.string(),
  answer: z.string()
})

I can start updating the editor as soon as the question field is non-empty. This really helps with responsiveness.

Also in my case I am not just requesting a simple array of flashcards but also asking GPT to cite the original sentence the flashcard was created from so I can create "citation pins" (you can see them get created at the end of the GIF). This slows the request down considerably, so it's doubly important for me to stream token by token.

stream-flashcards

bjsi commented 1 year ago

Here's the gist for the partial JSON parsing I'm using: https://gist.github.com/bjsi/3ad6297345aec91460cbce5aecac939c

lgrammel commented 1 year ago

Thanks for sharing! So you are also interested in streaming partial text content from what I understand. Do you validate the partial responses with Zod or only the full responses?

bjsi commented 1 year ago

Yeah currently I validate using the .deepPartial() version of the complete schema.

On Thu, 14 Sept 2023, 04:13 Lars Grammel, @.***> wrote:

Thanks for sharing! So you are also interested in streaming partial text content from what I understand. Do you validate the partial responses with Zod or only the full responses?

— Reply to this email directly, view it on GitHub https://github.com/lgrammel/modelfusion/issues/87#issuecomment-1718255468, or unsubscribe https://github.com/notifications/unsubscribe-auth/AN3UCA6W23LQ4RGATONTVELX2IHUXANCNFSM6AAAAAA4PY3OEA . You are receiving this because you were mentioned.Message ID: @.***>

lgrammel commented 1 year ago

Started to work on this: #113

lgrammel commented 1 year ago

A first version is available in v0.35.0. See https://modelfusion.dev/guide/function/generate-structure#streamstructure

Thanks for your input!

vercel / modelfusion

Support streaming response for structures #87