Closed bjsi closed 1 year ago
I was able to get something quick working by doing this:
import { OpenAIChatDelta, OpenAIChatModel } from "modelfusion";
export class StreamingToolModel extends OpenAIChatModel {
functionCall:
| {
name: string;
arguments?: string;
}
| undefined;
extractTextDelta(fullDelta: OpenAIChatDelta): string | undefined {
const msg = fullDelta[0];
if (msg?.["delta"]?.["function_call"]) {
if (!this.functionCall) {
this.functionCall = {
name: "",
arguments: "",
};
}
if ("name" in msg["delta"]["function_call"]) {
this.functionCall["name"] += msg["delta"]["function_call"]["name"];
}
if (msg["delta"]["function_call"]["arguments"]) {
this.functionCall["arguments"] +=
msg["delta"]["function_call"]["arguments"];
}
return JSON.stringify(this.functionCall);
} else {
return undefined;
}
}
}
Very cool!
I thought a bit about it earlier, but didn't see any advantages for streaming over regular generation for JSON, because you need to wait for the JSON to be fully available before it can be parsed and typechecked.
I'm curious what your use case is and what advantages streaming has for it?
I like using function calls for most things because I feel like I have more control over LLM output and the output is more reliable. If I'm generating a JSON array then I don't need to wait for the entire array to finish generating before I start processing some of the entries. My solution has been to use partial json parser to parse items out of the incomplete JSON :+1:
Ah, very cool. I'm leaning towards providing access to the raw stream responses as a first step - then you could extract the information from there. Going forward I'll think about how to integrate it into the JSON functionality.
@bjsi array streaming is totally useful - exploring this in more detail. Did you find any use cases for JSON streaming besides streaming top-level arrays item-by-item?
I also use it for the fields of objects. Eg for a flashcard creating assistant in a note-taking app:
const flashcard = z.object({
question: z.string(),
answer: z.string()
})
I can start updating the editor as soon as the question field is non-empty. This really helps with responsiveness.
Also in my case I am not just requesting a simple array of flashcards but also asking GPT to cite the original sentence the flashcard was created from so I can create "citation pins" (you can see them get created at the end of the GIF). This slows the request down considerably, so it's doubly important for me to stream token by token.
Here's the gist for the partial JSON parsing I'm using: https://gist.github.com/bjsi/3ad6297345aec91460cbce5aecac939c
Thanks for sharing! So you are also interested in streaming partial text content from what I understand. Do you validate the partial responses with Zod or only the full responses?
Yeah currently I validate using the .deepPartial() version of the complete schema.
On Thu, 14 Sept 2023, 04:13 Lars Grammel, @.***> wrote:
Thanks for sharing! So you are also interested in streaming partial text content from what I understand. Do you validate the partial responses with Zod or only the full responses?
— Reply to this email directly, view it on GitHub https://github.com/lgrammel/modelfusion/issues/87#issuecomment-1718255468, or unsubscribe https://github.com/notifications/unsubscribe-auth/AN3UCA6W23LQ4RGATONTVELX2IHUXANCNFSM6AAAAAA4PY3OEA . You are receiving this because you were mentioned.Message ID: @.***>
Started to work on this: #113
A first version is available in v0.35.0
. See https://modelfusion.dev/guide/function/generate-structure#streamstructure
Thanks for your input!
OpenAI supports streaming function calls. Here's an example in Python: https://gist.github.com/simonmesmith/bbeb894fc4ae954b246125eb2902800b. It would be a nice feature to have in modelfusion.