Streaming Support - Githubissues

DanielRosenwasser commented 11 months ago

Streaming is a very critical part of our app. Are there any plans to support it some how?

Originally posted by @cb-eli in https://github.com/microsoft/TypeChat/discussions/68

DanielRosenwasser commented 11 months ago

@cb-eli can you provide some background here? TypeChat is currently all about providing structured JSON results. While TypeChat could accept streaming JSON, the results always come back atomically at the moment, and I don't know if we would support other forms of streaming.

Can you elaborate on what your constrains are?

samhatoum commented 11 months ago

I'm actually looking at exactly this right now for a non TypeChat based app, and was wondering if there was support here.

I'm planning to handle this as follows. Say the stream has something that eventually looks like this:

{
  foo: [
    {obj: 1},
    {obj: 2},
    {obj: 3},
  ]
}

I would like to have a stream that looks like this

foo: []

foo: [{obj: 1}]

foo: [{obj: 1}, {obj: 2}]

so something like always closing objects/arrays until more parsable objects/arrays appear, and continuously adding fields as they become available.

samhatoum commented 11 months ago

another variation on the above is that as the fields themselves become more defined, like content being written into a string, it would also be streaming the repsonse

so

foo: [
  {obj: 1, text: 'hello '},
]

foo: [
  {obj: 1, text: 'hello w'},
]

foo: [
  {obj: 1, text: 'hello wo'},
]

foo: [
  {obj: 1, text: 'hello wor'},
]

foo: [
  {obj: 1, text: 'hello worl'},
]

foo: [
  {obj: 1, text: 'hello world'},
]

samhatoum commented 11 months ago

Here's a demo of something I cobbled together that streams JSON for inspiration:

https://recordit.co/oexbzcX7LJ

EDIT: the video frame rate doesn't do it justice, but it does actually stream the content inside strings

DanielRosenwasser commented 11 months ago

Thanks for the context - I still might need to ask a few more questions, so thanks in advance for your patience. 😄

First off, how do you intend to consume each of the streamed elements? Do you plan to immediately operate on them? What happens if a subsequent object has an error in some way?

[
  { "title": "Shake It Off", artist: "Taylor Swift" }, // <- was fine

  // until...

  "unrelated value" // <- oops! this causes an error!

Also, in your example, you have a response object that has a foo property. Is that actually realistic, or are responses usually only streaming at the top-level?

another variation on the above is that as the fields themselves become more defined, like content being written into a string, it would also be streaming the repsonse

so
foo: [
  {obj: 1, text: 'hello '},
]
foo: [
  {obj: 1, text: 'hello w'},
]

Earlier I gave an example of later values invalidating previous results. But with this, there's also the problem of intermediate values thrashing in phases of being invalid. For example, if you expect an array of the strings "foo" or "bar", then the following will be invalid:

[
  "foo",
  "bar",
  "fo  // <- what here? effectively gets auto-corrected to `"foo"`

I think at best one could imagine a version where whole values in an array are streamed - but those must be complete.

DanielRosenwasser commented 11 months ago

Another way to reframe this - if you'd be able to describe what the desired user experience you're trying to build here is, that would actually be the most helpful context!

cb-eli commented 11 months ago

Hi Daniel, Thanks for opening this thread.

I agree with @samhatoum regarding the expected output from TypeChat.

Our use case is similar to a chat app. The user submits his input and we stream back the result as text in Markdown syntax (as defined in the prompt).

We need the data to be structured to visualize the results. During the streaming we parse and visualize the Markdown text. When the full result returns, we parse it and save it as JSON for future use.

If more information is needed, I would be glad to provide it.

DanielRosenwasser commented 11 months ago

Our use case is similar to a chat app. The user submits his input and we stream back the result as text in Markdown syntax (as defined in the prompt).

We need the data to be structured to visualize the results. During the streaming we parse and visualize the Markdown text. When the full result returns, we parse it and save it as JSON for future use.

@cb-eli I think that's actually possible with the current API!

In your case, I believe you could create a custom TypeChatLanguageModel. Under the hood, you would connect to a language model API and the completer function could just be made to invoke a callback every time content comes in.

Here's an example of a streaming completer that uses node-fetch to open a streamed connection to OpenAI's chat completions REST API, which logs each streamed piece of content to standard output.

import assert from "assert";
import fetch from "node-fetch";
import dotenv from "dotenv";
import { TypeChatLanguageModel, success } from "typechat";

dotenv.config();

const modelName = process.env.OPENAI_MODEL;
const apiKey = process.env.OPENAI_API_KEY;

assert(modelName);
assert(apiKey);

const model: TypeChatLanguageModel = {
    complete: createStreamingCompleter(s => process.stdout.write(s))
};

function createStreamingCompleter(onContent: (content: string) => void) {
    return async function complete(prompt: string) {
        const response = await fetch("https://api.openai.com/v1/chat/completions", {
            method: "POST",
            headers: {
                Authorization: `Bearer ${apiKey}`,
                "Content-Type": "application/json",
            },
            body: JSON.stringify({
                model: modelName,
                stream: true,
                temperature: 0,
                n: 1,
                messages: [
                    { role: "user", content: prompt }
                ],
            })
        });

        assert(response.body);

        let result = "";
        for await (const data of parseDataEvents(response.body)) {
            const deltaContent = JSON.parse(data).choices[0].delta.content;
            if (deltaContent !== undefined) {
                onContent(deltaContent);
            }
            result += deltaContent;
        }
        return success(result);
    }
}

/**
 * @param {AsyncIterable<Buffer | string>} stream 
 */
async function* parseDataEvents(stream) {
    let tmp = ""
    for await (let part of stream) {
        part = part.toString();
        let start = 0;
        let end = 0;
        while ((end = part.indexOf("\n\n", start)) >= 0) {
            tmp += part.slice(start, end);
            if (tmp.startsWith("data: ")) {
                tmp = tmp.slice("data: ".length);
                if (tmp === "[DONE]") {
                    break;
                }
                yield tmp;
            }
            tmp = "";
            start = end + 2
        }
    }
}

And you can try it in action on a variation of our sentiment example.

```ts import assert from "assert"; import dotenv from "dotenv"; import fetch from "node-fetch"; import { TypeChatLanguageModel, createJsonTranslator, processRequests, success } from "typechat"; dotenv.config(); const modelName = process.env.OPENAI_MODEL; const apiKey = process.env.OPENAI_API_KEY; assert(modelName); assert(apiKey); const model: TypeChatLanguageModel = { complete: createStreamingCompleter(s => process.stdout.write(s)) }; export interface SentimentResponse { sentiment: "positive" | "negative" | "neutral"; } const schema = `export interface SentimentResponse { sentiment: "positive" | "negative" | "neutral"; }`; const translator = createJsonTranslator(model, schema, "SentimentResponse"); processRequests("> ", undefined, async (req) => { const result = await translator.translate(req); console.log("\n------\n"); if (result.success) { console.log(result.data); } else { console.log("Uh oh!"); process.exit(-1); } }); function createStreamingCompleter(onContent: (content: string) => void) { return async function complete(prompt: string) { const response = await fetch("https://api.openai.com/v1/chat/completions", { method: "POST", headers: { Authorization: `Bearer ${apiKey}`, "Content-Type": "application/json", }, body: JSON.stringify({ model: modelName, stream: true, temperature: 0, n: 1, messages: [ { role: "user", content: prompt } ], }) }); assert(response.body); let result = ""; for await (const data of parseDataEvents(response.body)) { const deltaContent = JSON.parse(data).choices[0].delta.content; if (deltaContent !== undefined) { onContent(deltaContent); } result += deltaContent; } return success(result); } } /** * @param {AsyncIterable} stream */ async function* parseDataEvents(stream) { let tmp = "" for await (let part of stream) { part = part.toString(); let start = 0; let end = 0; while ((end = part.indexOf("\n\n", start)) >= 0) { tmp += part.slice(start, end); if (tmp.startsWith("data: ")) { tmp = tmp.slice("data: ".length); if (tmp === "[DONE]") { break; } yield tmp; } tmp = ""; start = end + 2 } } } ```

Note that parseDataEvents is just something quick and dirty I wrote up to parse server-sent events. I haven't tested it, so you may want to look into solutions like https://www.npmjs.com/package/eventsource.

Let me know if this is a reasonable solution for your app. We've recently been considering a page on custom models and this streaming example could be a good fit for that.

johnnyreilly commented 11 months ago

I'm not sure if this is helpful or not, but in this (unrelated) issue where I'm reporting a problem with linked back ends in Azure static web apps I shared an example of what a front end that consumed an async text stream from a C# back end looks like:

https://github.com/Azure/static-web-apps/issues/1180

I share in case it proves useful background - feel free to ignore if not relevant

cb-eli commented 11 months ago

Our use case is similar to a chat app. The user submits his input and we stream back the result as text in Markdown syntax (as defined in the prompt). We need the data to be structured to visualize the results. During the streaming we parse and visualize the Markdown text. When the full result returns, we parse it and save it as JSON for future use.

@cb-eli I think that's actually possible with the current API!

In your case, I believe you could create a custom TypeChatLanguageModel. Under the hood, you would connect to a language model API and the completer function could just be made to invoke a callback every time content comes in.

Here's an example of a streaming completer that uses node-fetch to open a streamed connection to OpenAI's chat completions REST API, which logs each streamed piece of content to standard output.
import assert from "assert";
import fetch from "node-fetch";
import dotenv from "dotenv";
import { TypeChatLanguageModel, success } from "typechat";

dotenv.config();

const modelName = process.env.OPENAI_MODEL;
const apiKey = process.env.OPENAI_API_KEY;

assert(modelName);
assert(apiKey);

const model: TypeChatLanguageModel = {
    complete: createStreamingCompleter(s => process.stdout.write(s))
};

function createStreamingCompleter(onContent: (content: string) => void) {
    return async function complete(prompt: string) {
        const response = await fetch("https://api.openai.com/v1/chat/completions", {
            method: "POST",
            headers: {
                Authorization: `Bearer ${apiKey}`,
                "Content-Type": "application/json",
            },
            body: JSON.stringify({
                model: modelName,
                stream: true,
                temperature: 0,
                n: 1,
                messages: [
                    { role: "user", content: prompt }
                ],
            })
        });

        assert(response.body);

        let result = "";
        for await (const data of parseDataEvents(response.body)) {
            const deltaContent = JSON.parse(data).choices[0].delta.content;
            if (deltaContent !== undefined) {
                onContent(deltaContent);
            }
            result += deltaContent;
        }
        return success(result);
    }
}

/**
 * @param {AsyncIterable<Buffer | string>} stream 
 */
async function* parseDataEvents(stream) {
    let tmp = ""
    for await (let part of stream) {
        part = part.toString();
        let start = 0;
        let end = 0;
        while ((end = part.indexOf("\n\n", start)) >= 0) {
            tmp += part.slice(start, end);
            if (tmp.startsWith("data: ")) {
                tmp = tmp.slice("data: ".length);
                if (tmp === "[DONE]") {
                    break;
                }
                yield tmp;
            }
            tmp = "";
            start = end + 2
        }
    }
}
And you can try it in action on a variation of our sentiment example. Note that parseDataEvents is just something quick and dirty I wrote up to parse server-sent events. I haven't tested it, so you may want to look into solutions like https://www.npmjs.com/package/eventsource.

Let me know if this is a reasonable solution for your app. We've recently been considering a page on custom models and this streaming example could be a good fit for that.

The first char I receive for my prompt in the streamed response is \n When calling const result = await translator.translate(req); the result equals to:

{
  success: false,
  message: "Response is not JSON:\n",
}

and it exits.

Am I missing something?

DanielRosenwasser commented 11 months ago

I'm not sure - I'm definitely able to run this example locally on OpenAI's gpt-3.5-turbo successfully.

TypeChat sentiment example using streaming responses

samhatoum commented 11 months ago

My use-case is that I have a diagram that's being built up incrementally as the data becomes available.

I have to say, I'm currently working without types at all and using only JSON, so what I'm doing in the more freeform way may not be suitable for a typed approach as I think about it. I can see what you mean in terms of whole values being streamed.

DanielRosenwasser commented 11 months ago

My use-case is that I have a diagram that's being built up incrementally as the data becomes available.

I have to say, I'm currently working without types at all and using only JSON, so what I'm doing in the more freeform way may not be suitable for a typed approach as I think about it. I can see what you mean in terms of whole values being streamed.

Yeah, I think there are shades of gray here - a big part of the approach with TypeChat is that it enables you to guarantee a specific well-typed structure. If you're getting objects that are meant to model pure associative data like a map (which you might be using for these diagrams) then it might be overkill to use TypeChat to guarantee that you have a Record<string, string>.

That said, we might be able to do something better here over time.

microsoft / TypeChat

Streaming Support #70