openai / openai-node

The official Node.js / Typescript library for the OpenAI API
https://www.npmjs.com/package/openai
Apache License 2.0
7.6k stars 799 forks source link

How to use stream: true? #18

Closed raphaelrk closed 1 year ago

raphaelrk commented 2 years ago

I'm a bit lost as to how to actually use stream: true in this library.

Example incorrect syntax:

const res = await openai.createCompletion({
  model: "text-davinci-002",
  prompt: "Say this is a test",
  max_tokens: 6,
  temperature: 0,
  stream: true,
});

res.onmessage = (event) => {
  console.log(event.data);
}
ckarsan commented 1 year ago

@justinmahar it seems to strip out the triple backticks ```

hmm seems maybe more like the response from openAI doesnt always contain the backticks I will try adjusting the prompts and do some more testing

micksabox commented 1 year ago

I've spent a few hours trying options and searching for a full-stack solution. Not only did I need it to work on Vercel runtime, I needed to parse the data on a React front-end. I found https://github.com/usellm to be seamless and "just works".

jaankoppe commented 1 year ago

I've spent a few hours trying options and searching for a full-stack solution. Not only did I need it to work on Vercel runtime, I needed to parse the data on a React front-end. I found https://github.com/usellm to be seamless and "just works".

Not working on Remix.

victorpan77 commented 1 year ago

@gfortaine we actually use @microsoft/fetch-event-source for the playground to do streaming with POST 👍

Thank you all for sharing your solutions here! I agree that @smervs solution currently looks like the best option available for the openai-node package. Here's a more complete example with proper error handling and no extra dependencies:

try {
    const res = await openai.createCompletion({
        model: "text-davinci-002",
        prompt: "It was the best of times",
        max_tokens: 100,
        temperature: 0,
        stream: true,
    }, { responseType: 'stream' });

    res.data.on('data', data => {
        const lines = data.toString().split('\n').filter(line => line.trim() !== '');
        for (const line of lines) {
            const message = line.replace(/^data: /, '');
            if (message === '[DONE]') {
                return; // Stream finished
            }
            try {
                const parsed = JSON.parse(message);
                console.log(parsed.choices[0].text);
            } catch(error) {
                console.error('Could not JSON parse stream message', message, error);
            }
        }
    });
} catch (error) {
    if (error.response?.status) {
        console.error(error.response.status, error.message);
        error.response.data.on('data', data => {
            const message = data.toString();
            try {
                const parsed = JSON.parse(message);
                console.error('An error occurred during OpenAI request: ', parsed);
            } catch(error) {
                console.error('An error occurred during OpenAI request: ', message);
            }
        });
    } else {
        console.error('An error occurred during OpenAI request', error);
    }
}

This could probably be refactored into a streamCompletion helper function (that uses either callbacks or es6 generators to emit new messages).

Apologies there's not an easier way to do this within the SDK itself – the team will continue evaluating how to get this added natively, despite the lack of support in the current sdk generator tool we're using.

HELP! I am very new to javascript and Node and I am developing a voice assistant. I manage to get this code working but the problem is that I need in a variable the first 20 tokens of the response as soon as they are available so I can start responding to the user immediately without having to wait for the full response which can take too long to generate and time the session.

I am not able to do this because due to asynchrony I need to wrap this code in a promise and then I have to wait for the generation of all the tokens to complete before I get the result of the promise. That is, I need to have in the main code that variable with the first 20 tokens as soon as they are generated and then continue to generate the rest of the tokens until the end and get them in another variable.

If someone could help me I would be very grateful because I have tried many ways and I am not able to do it. I am going crazy!!! I don't want to use external libraries because I want to keep control of the code. Thank you very much in advance!!!

syonfox commented 1 year ago

https://github.com/syonfox/GPT-3-Encoder/blob/7013e4097f15ef400554af6ea04248da561c3c59/demo_app/streamOne.js

Ok so I plan to work on this more in the future but there are many options.

I believe the best solution is to use the fetch api since it is standard. This solution works for both chat and completion streaming but needs cleanup. The main problem is old versions of node does not have fetch so you need node 18 or browser and it needs to be bundled with a proper fetch polyfill.

This way it depends on nothing not even the opening lib.

Hope this helps we will solve more properly in a few days. KISS

I think it's one refactor away. I'll do it later. Cheers

On Mon, 22 May, 2023, 5:02 pm victorpan77, @.***> wrote:

@gfortaine https://github.com/gfortaine we actually use @microsoft/fetch-event-source @.***/fetch-event-source> for the playground to do streaming with POST 👍

Thank you all for sharing your solutions here! I agree that @smervs https://github.com/smervs solution currently looks like the best option available for the openai-node package. Here's a more complete example with proper error handling and no extra dependencies:

try { const res = await openai.createCompletion({ model: "text-davinci-002", prompt: "It was the best of times", max_tokens: 100, temperature: 0, stream: true, }, { responseType: 'stream' });

res.data.on('data', data => {
    const lines = data.toString().split('\n').filter(line => line.trim() !== '');
    for (const line of lines) {
        const message = line.replace(/^data: /, '');
        if (message === '[DONE]') {
            return; // Stream finished
        }
        try {
            const parsed = JSON.parse(message);
            console.log(parsed.choices[0].text);
        } catch(error) {
            console.error('Could not JSON parse stream message', message, error);
        }
    }
});} catch (error) {
if (error.response?.status) {
    console.error(error.response.status, error.message);
    error.response.data.on('data', data => {
        const message = data.toString();
        try {
            const parsed = JSON.parse(message);
            console.error('An error occurred during OpenAI request: ', parsed);
        } catch(error) {
            console.error('An error occurred during OpenAI request: ', message);
        }
    });
} else {
    console.error('An error occurred during OpenAI request', error);
}}

This could probably be refactored into a streamCompletion helper function (that uses either callbacks or es6 generators to emit new messages).

Apologies there's not an easier way to do this within the SDK itself – the team will continue evaluating how to get this added natively, despite the lack of support in the current sdk generator tool we're using.

HELP! I am very new to javascript and Node and I am developing a voice assistant. I manage to get this code working but the problem is that I need in a variable the first 20 tokens of the response as soon as they are available so I can start responding to the user immediately without having to wait for the full response which can take too long to generate and time the session.

I am not able to do this because due to asynchrony I need to wrap this code in a promise and then I have to wait for the generation of all the tokens to complete before I get the result of the promise. That is, I need to have in the main code that variable with the first 20 tokens as soon as they are generated and then continue to generate the rest of the tokens until the end and get them in another variable.

If someone could help me I would be very grateful because I have tried many ways and I am not able to do it. I am going crazy!!! I don't want to use external libraries because I want to keep control of the code. Thank you very much in advance!!!

— Reply to this email directly, view it on GitHub https://github.com/openai/openai-node/issues/18#issuecomment-1557055144, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACHJVDJZKKYOQCBD26RRSYDXHNFEFANCNFSM54BFHL3Q . You are receiving this because you commented.Message ID: @.***>

dosstx commented 1 year ago

Anyone using Nuxt version 3? Or, anyone got clues as to why data does not stream on Netlify vs localhost?

Does anyone know why running locally, the data is streamed and the text messages appear incrementally on the UI. However, running the same code on a production server (Netlify) , the data just appears all at once, instead of incrementally.

Is it the way I am hosting the app on Netlify? Perhaps I am not deploying the app to Netlify Edge vs standard Netlify? https://nitro.unjs.io/deploy/providers/netlify#netlify

fukemy commented 1 year ago

streamCompletion

I got error:

The provided value 'stream' is not a valid 'responseType'.
 WARN  Possible Unhandled Promise Rejection (id: 0):
TypeError: undefined is not a function
TypeError: undefined is not a function
jincdream commented 1 year ago

easy

server

// just use openai.js and next.js
const response = await openai.createChatCompletion({
     // your params
    },{
      responseType: 'stream'
    });

    // response
    // @ts-ignore
    response.data.pipe(res)

client:

const response = await fetch('/api/chat', {
    method: 'POST',
    headers: {
        'Content-Type': 'application/json',
        'Connection': 'keep-alive',
        'Response-Type': 'stream',
    },
    body: JSON.stringify({
        // your messages
        messages
    }),
});

if (response?.status !== 200) {
    // throw error
    return
}
const reader = response?.body?.getReader();
const decoder = new TextDecoder('utf-8');
let aiMessage = ''
while (true) {

    const { value, done } = await reader?.read() || {};
    if (done) break;
    const chunk = decoder.decode(value);
    // json parser `data:xxxx`
    aiMessage += getChatData(chunk);

}
dannysheridan commented 1 year ago

May want to check out this library that supports streaming for the OpenAI API: npm github

lucgagan commented 1 year ago

A solution I've put together in absence of the official support:

import { got } from 'got';

type Role = 'agent' | 'user';

export type ChatCompletionRequestMessage = {
  content: string;
  role: Role;
};

export const createChatCompletion = ({
  model,
  messages,
}: {
  messages: ChatCompletionRequestMessage[];
  model: string;
}): Promise<ChatCompletionRequestMessage[]> => {
  return new Promise((resolve, reject) => {
    const stream = got.stream('https://api.openai.com/v1/chat/completions', {
      headers: {
        // eslint-disable-next-line node/no-process-env
        Authorization: `Bearer ${process.env.OPENAI_API_KEY}`,
      },
      json: {
        messages,
        model,
        stream: true,
      },
      method: 'POST',
    });

    const choices: ChatCompletionRequestMessage[] = [];

    stream.on('data', (chunk) => {
      const body = chunk.toString();

      for (const message of body.split('\n')) {
        if (message === '') {
          continue;
        }

        if (message === 'data: [DONE]') {
          continue;
        }

        if (!message.startsWith('data: ')) {
          continue;
        }

        const json = JSON.parse(message.toString().slice('data: '.length)) as {
          choices: Array<{
            delta: { content?: string; role?: string };
            index: number;
          }>;
        };

        for (const choice of json.choices) {
          choices[choice.index] = choices[choice.index] ?? {};

          if (choice.delta.role) {
            choices[choice.index].role = choice.delta.role as Role;
          }

          if (choice.delta.content) {
            choices[choice.index].content = choices[choice.index].content ?? '';
            choices[choice.index].content += choice.delta.content;
          }
        }
      }
    });

    const intervalId = setInterval(() => {
      for (const choice of choices) {
        // eslint-disable-next-line no-console
        console.log('---\n' + choice.content + '\n---');
      }
    }, 5_000);

    stream.on('error', (error) => {
      clearInterval(intervalId);

      reject(error);
    });

    stream.on('end', () => {
      clearInterval(intervalId);

      resolve(choices);
    });
  });
};
dosstx commented 1 year ago

@lucgagan Does that code work with edge functions? Many solutions in this thread, but few work with edge functions (and none so far with Nuxt).

lucgagan commented 1 year ago

Don't know. I don't use edge functions.

xkungfu commented 1 year ago

what a long issue for open ai stream? why no one try to build a package?

Only a paranoid would read such a long post.

lucgagan commented 1 year ago

Okay, I published my first open-source project to make this work.

https://github.com/lucgagan/completions

import { createChat } from "completions";

/**
 * @property apiKey - OpenAI API key.
 * @property frequencyPenalty - Number between -2.0 and 2.0. Positive values penalize new
 *    tokens based on their existing frequency in the text so far, decreasing the model's
 *    likelihood to repeat the same line verbatim.
 * @property logitBias - Number between -2.0 and 2.0. Positive values penalize new tokens
 *    based on their existing frequency in the text so far, decreasing the model's likelihood to
 *    repeat the same line verbatim.
 * @property maxTokens – The maximum number of tokens to generate in the chat completion.
 *    The total length of input tokens and generated tokens is limited by the model's context length.
 * @property model - ID of the model to use. See the model endpoint compatibility table for
 *    details on which models work with the Chat API.
 * @property n - How many chat completion choices to generate for each input message.
 * @property presencePenalty - Number between -2.0 and 2.0. Positive values penalize new
 *    tokens based on whether they appear in the text so far, increasing the model's
 *    likelihood to talk about new topics.
 * @property stop - Up to 4 sequences where the API will stop generating further tokens.
 * @property temperature - What sampling temperature to use, between 0 and 2. Higher values
 *    like 0.8 will make the output more random, while lower values like 0.2 will make it
 *    more focused and deterministic.
 *    We generally recommend altering this or top_p but not both.
 * @property topP - An alternative to sampling with temperature, called nucleus sampling,
 *    where the model considers the results of the tokens with top_p probability mass.
 *    So 0.1 means only the tokens comprising the top 10% probability mass are considered.
 *    We generally recommend altering this or temperature but not both.
 * @property user - A unique identifier representing your end-user, which can help OpenAI
 *    to monitor and detect abuse.
 */
const chat = createChat({
  apiKey: process.env.OPENAI_API_KEY,
  model: "gpt-3.5-turbo",
  // or:
  // model: 'gpt-4',
});

await chat.sentMessage("Ping");

You can stream the response too:

await chat.sentMessage("continue the sequence: a b c", (message) => {
  console.log(message);
});
sidujjain commented 1 year ago

I've spent a few hours trying options and searching for a full-stack solution. Not only did I need it to work on Vercel runtime, I needed to parse the data on a React front-end. I found https://github.com/usellm to be seamless and "just works".

Not working on Remix.

@jaankoppe I've added an example for Remix here: https://github.com/usellm/example-remix-app Let me know if it works.

Kypaku commented 1 year ago

My Solution, using "request". Also, you can check how it works here: https://github.com/Kypaku/gpt-simple-api https://github.com/Kypaku/vue-gpt-playground

import request from "request";
import AbortController from "abort-controller";

const abortController = new AbortController();

const model = "gpt-4";

const isChatModel = true;
const _prompt = "Generate 200 words"
const messages = [{ role: "user", content: _prompt as string }];

const endpoint = isChatModel ? "/v1/chat/completions" : "/v1/completions";
const signal = abortController.signal;

const body = JSON.stringify({
    model,
    prompt: isChatModel ? undefined : _prompt,
    messages: messages,
    temperature: 0,
    max_tokens: 2000,
    top_p: 1,
    stream: true,
});

const req = request({
    url: "https://api.openai.com" + endpoint,
    method: "POST",
    headers: {
        "Content-Type": "application/json",
        Authorization: "Bearer " + KEY
    },
    body: body,
});

req.on("error", (e: any) => {
    console.error("problem with request:" + e.message);
});

// req.write(body);
req.on("data", (chunk: any) => {
    try {
        let delta = "";
        if (chunk?.toString().match(/^\{\n\s+\"error\"\:/)) {
            console.error("getStream error:", chunk.toString());
            reject(JSON.parse(chunk.toString().trim()));
            return;
        }
        const lines = chunk?.toString()?.split("\n") || [];
        const filtredLines = lines.filter((line: string) => line.trim());
        const line = filtredLines[filtredLines.length - 1];
        const data = line.toString().replace("data:", "").replace("[DONE]", "").replace("data: [DONE]", "").trim();
        if (data) {
            const json = JSON.parse(data);
            json.choices.forEach((choice: any) => {
                delta += choice.text || choice.message?.content || choice.delta?.content || "";
            });
            fData(delta, json, chunk.toString());
        }
    } catch (e) {
        console.error("getStream handle chunk error:", e, chunk.toString());
    }
})

req.on("end", () => {
    fEnd?.();
    resolve();
});

req.on("abort", () => {
    fEnd?.();
    resolve();
})

setTimeout(() => req.abort(), 5000)
lucgagan commented 1 year ago

@Kypaku any reason for not using https://npmjs.com/completions ?

Kypaku commented 1 year ago

@Kypaku any reason for not using https://npmjs.com/completions ?

Interesting! Do you have a demo?

lucgagan commented 1 year ago

I've not had one before, but now I do! https://replit.com/@LucGagan/Completions?v=1

zolero commented 1 year ago

Really? No one? Convert the response data to a Readable Stream. And listen on the event readable. Hope this helps you all folks.

    /**
     * Stream OpenAI chat completion
     * @param message
     * @param from
     */
    async streamChatCompletion({ message }: { message: string }) {
        console.time("chatCompletion");
        const response = await this.openai.createChatCompletion(
            {
                model: "gpt-3.5-turbo",
                stream: true,
                messages: [
                    {
                        content: message,
                        role: "user"
                    }
                ],
                max_tokens: 512,
                temperature: 0.9
            },
            { responseType: "stream" }
        );

        const readable = Readable.from(response.data);
        readable.on("readable", () => {
            let chunk;
            while (null !== (chunk = readable.read())) {
                console.log(`read: ${chunk}`);
            }
        });
        console.timeEnd("chatCompletion");
    }
lucgagan commented 1 year ago

Really? No one? Convert the response data to a Readable Stream. And listen on the event readable. Hope this helps you all folks.

    /**
     * Stream OpenAI chat completion
     * @param message
     * @param from
     */
    async streamChatCompletion({ message }: { message: string }) {
        console.time("chatCompletion");
        const response = await this.openai.createChatCompletion(
            {
                model: "gpt-3.5-turbo",
                stream: true,
                messages: [
                    {
                        content: message,
                        role: "user"
                    }
                ],
                max_tokens: 512,
                temperature: 0.9
            },
            { responseType: "stream" }
        );

        const readable = Readable.from(response.data);
        readable.on("readable", () => {
            let chunk;
            while (null !== (chunk = readable.read())) {
                console.log(`read: ${chunk}`);
            }
        });
        console.timeEnd("chatCompletion");
    }

That only works in Node.js.

Most of this thread is about how to make it work in Node.js and browser. This is why I built the https://github.com/lucgagan/completions abstraction.

FengZi-lv commented 1 year ago

@gfortaine we actually use @microsoft/fetch-event-source for the playground to do streaming with POST 👍

Thank you all for sharing your solutions here! I agree that @smervs solution currently looks like the best option available for the openai-node package. Here's a more complete example with proper error handling and no extra dependencies:

try {
    const res = await openai.createCompletion({
        model: "text-davinci-002",
        prompt: "It was the best of times",
        max_tokens: 100,
        temperature: 0,
        stream: true,
    }, { responseType: 'stream' });

    res.data.on('data', data => {
        const lines = data.toString().split('\n').filter(line => line.trim() !== '');
        for (const line of lines) {
            const message = line.replace(/^data: /, '');
            if (message === '[DONE]') {
                return; // Stream finished
            }
            try {
                const parsed = JSON.parse(message);
                console.log(parsed.choices[0].text);
            } catch(error) {
                console.error('Could not JSON parse stream message', message, error);
            }
        }
    });
} catch (error) {
    if (error.response?.status) {
        console.error(error.response.status, error.message);
        error.response.data.on('data', data => {
            const message = data.toString();
            try {
                const parsed = JSON.parse(message);
                console.error('An error occurred during OpenAI request: ', parsed);
            } catch(error) {
                console.error('An error occurred during OpenAI request: ', message);
            }
        });
    } else {
        console.error('An error occurred during OpenAI request', error);
    }
}

This could probably be refactored into a streamCompletion helper function (that uses either callbacks or es6 generators to emit new messages).

Apologies there's not an easier way to do this within the SDK itself – the team will continue evaluating how to get this added natively, despite the lack of support in the current sdk generator tool we're using.

Hi @schnerd, but how to get res.usage.total_tokens when using this method?

schnerd commented 1 year ago

Hi @schnerd, but how to get res.usage.total_tokens when using this method?

@FengZi-lv usage data is not currently available in streamed responses, we may look into adding this in the future.

schnerd commented 1 year ago

Folks – I'm happy to report that we're working on a new version of our SDK that supports streaming out of the box! We'd love to have some devs start testing it so we can work out all the kinks before releasing it.

Check out the v4.0.0 Beta discussion to learn more.

(and please keep discussion about the new version there, rather than in this issue)

Hebilicious commented 1 year ago

If you're still wondering how to use stream:true, vercel just released a neat package to do this : https://github.com/vercel-labs/ai

dosstx commented 1 year ago

If you're still wondering how to use stream:true, vercel just released a neat package to do this : https://github.com/vercel-labs/ai

Have you tried it with Nuxt? I tried to deploy it (cloned repo and deployed) but get 502 error when running on vercel-edge. Works fine locally (again).

I'll try the 4.0 beta mentioned above.

Hebilicious commented 1 year ago

If you're still wondering how to use stream:true, vercel just released a neat package to do this : vercel-labs/ai

Have you tried it with Nuxt? I tried to deploy it (cloned repo and deployed) but get 502 error when running on vercel-edge. Works fine locally (again).

I'll try the 4.0 beta mentioned above.

For Nuxt you will need a patched version of h3 and nitro for your provider as it currently will only work with node based environments (See working poc here)

oldturkey commented 1 year ago

i am currently can‘t download this verion

oldturkey commented 1 year ago

Folks – I'm happy to report that we're working on a new version of our SDK that supports streaming out of the box! We'd love to have some devs start testing it so we can work out all the kinks before releasing it.

Check out the v4.0.0 Beta discussion to learn more.

(and please keep discussion about the new version there, rather than in this issue)

i am currently can‘t download this verion

dandonahoe commented 1 year ago

Here's a TypeScript version with type annotations and all put into one single standalone file.

Got some inspiration here: StackOverflow

To call this code, use


import ChatGPT from './ChatGPT';
const completion = await ChatGPT.createSimpleCompletion(prompt);

/// `ChatGPT.tsx`

import { CreateCompletionResponse } from 'openai';
import { IncomingMessage } from 'http';

// 
// const configuration = new Configuration({
//  apiKey: process.env.OPENAI_API_KEY,
// });
// const OpenAI = new OpenAIApi(configuration);

async function* chunksToLines(chunksAsync : IncomingMessage) : AsyncGenerator<string> {
    let previous = '';

    for await (const chunk of chunksAsync) {

        const bufferChunk = Buffer.isBuffer(chunk) ? chunk : Buffer.from(chunk);

        previous += bufferChunk;

        let eolIndex;

        while ((eolIndex = previous.indexOf('\n')) >= 0) {

            // line includes the EOL
            const line = previous.slice(0, eolIndex + 1).trimEnd();

            if (line === 'data: [DONE]')
                break;

            if (line.startsWith('data: '))
                yield line;

            previous = previous.slice(eolIndex + 1);
        }
    }
}

async function* linesToMessages(
    linesAsync : AsyncGenerator<string>,
) : AsyncGenerator<string> {

    for await (const line of linesAsync) {

        const message = line.substring('data :'.length);

        yield message;
    }
}

async function* streamCompletion(data : IncomingMessage) : AsyncGenerator<string> {
    yield* linesToMessages(chunksToLines(data));
}

const createSimpleCompletion = async (prompt : string) : Promise<string> => {
    let completeText = '';

    const completion = await OpenAI.createCompletion({
        model      : 'text-davinci-003',
        prompt,
        max_tokens : 1000,
        stream     : true,
    }, {
        responseType : 'stream',
    });

    for await (const message of streamCompletion(completion.data as unknown as IncomingMessage))
        try {
            // convert to the openai response type

            const parsed = JSON.parse(message) as CreateCompletionResponse;

            const { text } = parsed.choices[0];

            completeText += text;

        } catch (error) {
            console.error('Could not JSON parse stream message', message, error);
        }

    return completeText.trim();
};

const ChatGTP = {
    createSimpleCompletion,
};

export default ChatGTP;
lucgagan commented 1 year ago

If you prefer to not depend on packages, you can just copy paste the source from https://github.com/lucgagan/completions

It is just couple of files, and it neatly handles input and output validation.

FengZi-lv commented 1 year ago

Hi @schnerd, but how to get res.usage.total_tokens when using this method?

@FengZi-lv usage data is not currently available in streamed responses, we may look into adding this in the future.

So is there any way to get total_tokens now?

ykdojo commented 1 year ago

Hi, I just came across this problem and this thread.

Would it be helpful if I sent a PR to add this feature?

RobertCraigie commented 1 year ago

Hey @ykdojo, there's a new beta version of the SDK in the works that supports streaming out of the box. See this discussion for more details, all feedback welcome!

Shaykuu commented 1 year ago

I've been trying to do this, but I'm butting up against Google App Engine's buffering 'feature' which negates the streaming. I'm thinking of migrating servers, but I need a simple solution that will not buffer - server management is not my strong suit. I'm considering Heroku. Any advice?

RobertCraigie commented 1 year ago

Hey @Shaykuu, it looks like you can disable buffering, does this solve your issue?

rattrayalex commented 1 year ago

You can use stream: true in our upcoming v4 like so:

const stream = await client.completions.create({
  prompt: 'Say this is a test',
  model: 'text-davinci-003',
  stream: true,
});

for await (const part of stream) {
  process.stdout.write(part.choices[0]?.text || '');
}

A full example is here: https://github.com/openai/openai-node/blob/v4/examples/demo.ts

We expect to add additional conveniences for working with completion streams in the near future.

If you encounter any problems, please open a new issue!

EDIT: there's a bug for this in beta.3; beta.2 or beta.4 (which should be released in a few hours) will work in most environments.

robegamesios commented 1 year ago

i get the parts per word, is there a way to get the parts by sentence or like a set duration (e.g. every 5 seconds)? thanks

rattrayalex commented 1 year ago

@robegamesios that's a question better suited for the community forum: https://community.openai.com (though the short answer is no)

ashubham commented 1 year ago

Well in most cases people use the openai client on the server side and forward the stream to the client. Now they cannot use the openai-node on browser/client because the stream is actually coming from their own backend. For this, we are releasing fetch-stream-observable. Which enables this use case in a simple way.

rattrayalex commented 1 year ago

That looks potentially handy; please keep in mind that API Keys should not be used in the browser: https://help.openai.com/en/articles/5112595-best-practices-for-api-key-safety

ManInTheWind commented 1 year ago

You can use stream: true in our upcoming v4 like so:

const stream = await client.completions.create({
  prompt: 'Say this is a test',
  model: 'text-davinci-003',
  stream: true,
});

for await (const part of stream) {
  process.stdout.write(part.choices[0]?.text || '');
}

A full example is here: https://github.com/openai/openai-node/blob/v4/examples/demo.ts

We expect to add additional conveniences for working with completion streams in the near future.

If you encounter any problems, please open a new issue!

EDIT: there's a bug for this in beta.3; beta.2 or beta.4 (which should be released in a few hours) will work in most environments.

I feel that this stream is not good, when do I know the end of the stream? and the error of stream.

rattrayalex commented 1 year ago

Like this:

try {
  for await (const part of stream) {
    process.stdout.write(part.choices[0]?.text || '');
  }
  console.log('The stream is over.')
} catch (err) {
  console.error('The stream had an error', err)
}
huytool157 commented 12 months ago

How do I get the token count once the stream is complete? I don't see it on the example. The new API is much nicer btw.

rattrayalex commented 12 months ago

As mentioned earlier in the thread:

usage data is not currently available in streamed responses, we may look into adding this in the future

4vanger commented 10 months ago

You can use stream: true in our upcoming v4 like so: Neat! However it is really non-trivial making it work with express.js - I hacked a solution using custom ReadableStream implementation to pipe into response but I feel that's not the easiest solution. Adding another example for streaming responses from server to client will be very helpful!

Pedroglp commented 9 months ago

Finally I was able to make it work using Sveltekit:

export const POST = (async ({ request }) => {
    const payload = await request.json();
    let streamOpenAI:Stream<OpenAI.Chat.Completions.ChatCompletionChunk>;

    try { 
        streamOpenAI = (await (openai.chat.completions.create({
            model: "gpt-4",
            messages: [{
                role: "user",
                content: `your prompt`
            }],
            stream: true,})
        ));
    } catch(e) {
        throw error(500, 'Failed creating data stream.');
    }

    let response = new ReadableStream({
        async start(controller) {
            try {
                for await (const part of streamOpenAI) {
                    controller.enqueue(part.choices[0]?.delta.content || '');
                }
                controller.close();
                return;
            } catch (err) {
                controller.close();
                throw error(500, 'Error while processing data stream.')
            }
        },
    })

    return new Response(response, {
        status: 200,
        headers: {
          'content-type': 'text/event-stream',
          'Connection': 'keep-alive'
        }
    });
}) satisfies RequestHandler;
rattrayalex commented 9 months ago

@Pedroglp you may be interested in .toReadableStream() and .fromReadableStream(), – see examples/stream-to-client-next.ts for one example

Frankjunyulin commented 9 months ago

@rattrayalex Hi Alex, I followed the references you shared, but got following error:

[13:15:43.316] ERROR (58268): We are sorry, something goes wrong: Converting circular structure to JSON
    --> starting at object with constructor 'ReadableStream'
    |     property '_readableStreamController' -> object with constructor 'ReadableStreamDefaultController'
    --- property '_controlledReadableStream' closes the circle

Don't know how to figure it out, could you help?

In my backend:

export default async (
  req: NextApiRequest,
  res: NextApiResponse
): Promise<void> => {
...

  const completion = awaitopenaiApi().chat.completions.create({
    model: OPENAI_API_MODEL,
    temperature: 0,
    openaiMessages,
    stream: true,
  })
  if (completion) {
    const message = {
      content: completion.toReadableStream(),
      role: "assistant",
      createdAt: new Date().toISOString(),
    };
    /*
    await prisma.conversation.update({
      where: {id: conversation.id},
      data: {
        messages: {
          push: message,
        },
      },
    });
    */

    res.status(200).json({conversationId: conversation.id, message});
  } else {
    res.status(200).json({conversationId: conversation.id});
  }

In the Frontend:

const data = await response.json();
      const runner = ChatCompletionStreamingRunner.fromReadableStream(
        data.message
      );
      runner.on("content", (delta, snapshot) => {
        console.log(delta);
      });
      console.dir(await runner.finalChatCompletion(), {depth: null});
    } catch (error) {
      // Consider implementing your own error handling logic here
      logger.error(error);
      // alert(error.message);
    }
rattrayalex commented 9 months ago

@Frankjunyulin you can't send a stream within a json response body, the stream must be the entire response body. Your code should look more like this:

export default async (
  req: NextApiRequest,
  res: NextApiResponse
): Promise<void> => {
  // ...

  const stream = await openaiApi().beta.chat.completions.stream({
    model: OPENAI_API_MODEL,
    temperature: 0,
    openaiMessages,
    stream: true,
  })

  stream.on('message', async (message) => {
    await prisma.conversation.update({
      where: {id: conversation.id},
      data: {
        messages: {
          push: message,
        },
      },
    });
  })

  return Response(stream.toReadableStream()) // this is how it works in app router, not sure the exact syntax for this in pages router…
}

and on the frontend, you want more like this:

const runner = ChatCompletionStreamingRunner.fromReadableStream(
  response.body
);
runner.on("content", (delta, snapshot) => {
  console.log(delta);
});
console.dir(await runner.finalChatCompletion(), {depth: null});