openai / openai-node

The official Node.js / Typescript library for the OpenAI API
https://www.npmjs.com/package/openai
Apache License 2.0
7.4k stars 773 forks source link

How to use stream: true? #18

Closed raphaelrk closed 1 year ago

raphaelrk commented 2 years ago

I'm a bit lost as to how to actually use stream: true in this library.

Example incorrect syntax:

const res = await openai.createCompletion({
  model: "text-davinci-002",
  prompt: "Say this is a test",
  max_tokens: 6,
  temperature: 0,
  stream: true,
});

res.onmessage = (event) => {
  console.log(event.data);
}
george-i commented 1 year ago

kudos to @ponytojas for the regex const delta = decoder.decode(value).match(/"delta":\s*({.*?"content":\s*".*?"})/)?.[1]

fracergu commented 1 year ago

Related to @ponytojas solution I have found that most of the time the first response contains several token blocks, so when the regular expression is applied it only takes the first one and discards the others

debug image

I've added this function for decoding:


const utf8Decoder = new TextDecoder('utf-8')

const decodeResponse = (response?: Uint8Array) => {
  if (!response) {
    return ''
  }

  const pattern = /"content"\s*:\s*"([^"]*)"/g
  const decodedText = utf8Decoder.decode(response)
  const matches: string[] = []

  let match
  while ((match = pattern.exec(decodedText)) !== null) {
    matches.push(match[1])
  }

  return matches.join('')
}

And used it on read() function of the approach, also remove JSON.Parse():

  ...
  async function read() {
    const { value, done } = await reader.read()

    if (done) return onText(fullText)

    const delta = decodeResponse(value)

    if (delta) {
      fullText += delta

      //Detects punctuation, if yes, fires onText once per .5 sec
      if (/[\p{P}\p{S}]/u.test(delta)) {
        const now = Date.now()

        if (now - lastFire > 500) {
          lastFire = now
          onText(fullText)
        }
      }
    }

    await read()
  }
  ...

Now I'm getting all the tokens:

debug pic 2

I hope you find it helpful

tsenguunchik commented 1 year ago

@fracergu great solution, but the regex doesn't work when the content have double quote in them (an escaped character). Like "content": "\"" it just shows \ which is wrong. It should show "

fracergu commented 1 year ago

@tsenguunchik Yes, I made the publication too fast and I'm just now struggling with it. I didn't see the problem until I started requesting code and it started failing to receive double quotes. I will update here when I get a solution.

EDIT: Problem solved, I recovered the original regex and parsing and now it works pretty well

const decodeResponse = (response?: Uint8Array) => {
  if (!response) {
    return ''
  }

  const pattern = /"delta":\s*({.*?"content":\s*".*?"})/g
  const decodedText = utf8Decoder.decode(response)
  const matches: string[] = []

  let match
  while ((match = pattern.exec(decodedText)) !== null) {
    matches.push(JSON.parse(match[1]).content)
  }
  return matches.join('')
}

Also is not losing the few first tokens that come "in pack"

debug 3

fracergu commented 1 year ago

Here you have the custom hook I'm using with React, in case you find it useful. setInputMessages receives the list of messages from which we expect a completion and triggers the fetch. partialText returns the partial text of the response as it is received in real time, and finally fullText returns the response when it is complete. You have the types I use at the beginning of the file. It still lacks error handling, as it is still under development. I'm sorry if there is any bad practice or incorrectness, as I'm fairly new to React.

import { useState, useEffect } from 'react'

enum Role {
  ASSISTANT = 'assistant',
  USER = 'user',
}

type Message = {
  role: Role
  content: string
}

const API_URL = 'https://api.openai.com/v1/chat/completions'
const OPENAI_API_KEY = import.meta.env.VITE_OPENAI_API_KEY
const OPENAI_CHAT_MODEL = 'gpt-3.5-turbo'

const utf8Decoder = new TextDecoder('utf-8')

const decodeResponse = (response?: Uint8Array) => {
  if (!response) {
    return ''
  }

  const pattern = /"delta":\s*({.*?"content":\s*".*?"})/g
  const decodedText = utf8Decoder.decode(response)
  const matches: string[] = []

  let match
  while ((match = pattern.exec(decodedText)) !== null) {
    matches.push(JSON.parse(match[1]).content)
  }
  return matches.join('')
}

export const useStreamCompletion = () => {
  const [partialText, setPartialText] = useState('')
  const [fullText, setFullText] = useState('')
  const [inputMessages, setInputMessages] = useState<Message[]>([])
  const abortController = new AbortController()

  useEffect(() => {
    if (!inputMessages.length) return

    const onText = (text: string) => {
      setPartialText(text)
    }

    const fetchData = async () => {
      try {
        const response = await fetch(API_URL, {
          method: 'POST',
          headers: {
            Authorization: `Bearer ${OPENAI_API_KEY}`,
            'Content-Type': 'application/json',
          },
          body: JSON.stringify({
            messages: inputMessages,
            model: OPENAI_CHAT_MODEL,
            stream: true,
          }),
          signal: abortController.signal, // assign the abort controller signal to the fetch request
        })

        if (!response.ok) {
          const error = await response.json()
          throw new Error(error.error)
        }

        if (!response.body) throw new Error('No response body')

        const reader = response.body.getReader()

        let fullText = ''

        async function read() {
          const { value, done } = await reader.read()

          if (done) return onText(fullText)

          const delta = decodeResponse(value)

          if (delta) {
            fullText += delta
            onText(fullText.trim())
          }

          await read()
        }

        await read()

        setFullText(fullText)
      } catch (error) {
        console.error(error)
      }
    }

    fetchData()

    return () => {
      abortController.abort()
    }
  }, [inputMessages])

  return { partialText, fullText, setInputMessages }
}
darknoon commented 1 year ago

I found this code from Hassan at Vercel helpful for streaming the OpenAI api in an edge function

shezhangzhang commented 1 year ago

@darknoon Yes, it works. But I created an issue with it, because it didn't handle the error chunk. https://github.com/Nutlope/twitterbio/issues/25

shezhangzhang commented 1 year ago

Does anybody know why the chunks have been split when I deployed it to vercel edge function?

image
UncaughtCursor commented 1 year ago

@shezhangzhang This chunk-split issue has only happened to me once so far in an hour of testing. I'm running a Node.js instance on localhost right now, not on Vercel. Perhaps it's a rare occurrence, but an occurrence we have to account for nonetheless.

shezhangzhang commented 1 year ago

@shezhangzhang This chunk-split issue has only happened to me once so far in an hour of testing. I'm running a Node.js instance on localhost right now, not on Vercel. Perhaps it's a rare occurrence, but an occurrence we have to account for nonetheless.

Yup, I didn't figured it out. When you deployed it on Vercel, the chunk-split issue could happen with every request. I don't know the reason๐Ÿฅฒ

KaleRakker commented 1 year ago

For me, this doesn't solve the issue with the unescaped " character. Any suggestions?

EDIT:

this happens with the following input. data: {"id":"chatcmpl-6wbcJjX6ttujYAT6rFv8eZMxS2daD","object":"chat.completion.chunk","created":1679425643,"model":"gpt-3.5-turbo-0301","choices":[{"delta":{"content":"\"}"},"index":0,"finish_reason":null}]}

Screenshot 2023-03-21 at 20 05 05

@tsenguunchik Yes, I made the publication too fast and I'm just now struggling with it. I didn't see the problem until I started requesting code and it started failing to receive double quotes. I will update here when I get a solution.

EDIT: Problem solved, I recovered the original regex and parsing and now it works pretty well

const decodeResponse = (response?: Uint8Array) => {
  if (!response) {
    return ''
  }

  const pattern = /"delta":\s*({.*?"content":\s*".*?"})/g
  const decodedText = utf8Decoder.decode(response)
  const matches: string[] = []

  let match
  while ((match = pattern.exec(decodedText)) !== null) {
    matches.push(JSON.parse(match[1]).content)
  }
  return matches.join('')
}

Also is not losing the few first tokens that come "in pack"

debug 3

george-i commented 1 year ago

I have a problem with encoding when making requests.

For this text:

Remove citation from this text: Gardening can be defined as an activity in a garden setting to grow, cultivate, and look after plants (e.g., flowers, vegetables) for non-commercial use (Gillard, 2001, p. 832; Kingsley et al., 2021). There is some evidence to suggest gardening is a moderately intense physical activity ranging from low-to moderate-intensity for older age groups (>63 years; Park et al., 2008; Park et al., 2011) to moderate-to high-intensity in younger adults (>20 years; Park et al., 2014)

The request fails before sending it around the text

(>20 years

I tried with encodeURIComponent and works fine, but in return the API says among others:

just a reminder to please use proper formatting

Furthermore, sometimes the response is itself encoded.

FurriousFox commented 1 year ago

I'm not sure if this is useful for you guys, but here's my modified version of eventsource with added support for setting method and payload/body which should be sufficient to have a fully featured SSE connection to openai https://gist.github.com/FurriousFox/f43eaf9645302e51ab01cf0b1853aa4e

something like this should then work

const EventSource = require("./eventsource.js");

let es = new EventSource("https://api.openai.com/v1/chat/completions", {
    method: "POST",
    headers: {
        "Content-Type": "application/json",
        "Authorization": `Bearer ${api_key}`
    },
    payload: JSON.stringify({
        "model": "gpt-3.5-turbo",
        "messages": [{
            "role": "system",
            "content": instruction
        }, {
            "role": "user",
            "content": prompt
        }],
        stream: true,
    }),
});
es.onmessage = (e) => {
    if (e.data == "[DONE]") {
        es.close();
    } else {
        let delta = JSON.parse(e.data).choices[0].delta.content;
        if (delta) {
            console.log(delta);
        }
    }
};
syonfox commented 1 year ago

Ok just my 2 cents, Network fishyness and the packet failing or the actual server randomly optimizing. I predict this is one of the reasons openai responses fail sometimes in chat.openai.com anyhow. The solution may include a library that very flexibly parses the input and ensure text is joined and any random input is accepted and pares in a most forgiving way. if i remember correctly jsmn is a c++ lib if anyone has a super clean c implementation of this I would be interested pr ++ anyhow I think the regex above a ways is probably the best middle ground they nerfed the response stats anyways. Happy coding

On Wed, Mar 22, 2023 at 2:15โ€ฏPM Micha @.***> wrote:

I'm not sure if this is useful for you guys, but here's my modified version of eventsource https://www.npmjs.com/package/eventsource with added support for setting method and payload/body which should be sufficient to have a fully featured SSE connection to openai https://gist.github.com/FurriousFox/f43eaf9645302e51ab01cf0b1853aa4e

something like this should then work

const EventSource = require("./eventsource.js"); let es = new EventSource("https://api.openai.com/v1/chat/completions", { method: "POST", headers: { "Content-Type": "application/json", "Authorization": Bearer ${api_key} }, payload: JSON.stringify({ "model": "gpt-3.5-turbo", "messages": [{ "role": "system", "content": instruction }, { "role": "user", "content": prompt }], stream: true, }),});es.onmessage = (e) => { if (e.data == "[DONE]") { es.close(); } else { let delta = JSON.parse(e.data).choices[0].delta.content; if (delta) { console.log(delta); } }};

โ€” Reply to this email directly, view it on GitHub https://github.com/openai/openai-node/issues/18#issuecomment-1480271262, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACHJVDN6FWH33CNKDGYBKJ3W5NTYNANCNFSM54BFHL3Q . You are receiving this because you commented.Message ID: @.***>

danneu commented 1 year ago

There are examples above that show how to consume the token stream with events and callbacks, but a potentially simpler alternative is to use an async iterator that yields tokens, even if you ultimately want to just push each token into a stream or callback anyways.

Usage:

const messages = [
    { role: 'user', content: 'what is the meaning of life?' }
]

let answer = ''
for await (const token of streamChatCompletion(messages)) {
    answer += token
    // do something async here if you want
}
console.log('answer finished:', answer)

The implementation is easy because an Axios' response set to streaming already lets you consume its chunks with an async iterator:

const { OpenAIApi } = require('openai')
const openai = new OpenAIApi(...)

async function* streamChatCompletion(messages) {
    const response = await openai.createChatCompletion(
        {
            model: 'gpt-3.5-turbo',
            messages,
            stream: true,
        },
        {
            responseType: 'stream',
        },
    )

    for await (const chunk of response.data) {
        const lines = chunk
            .toString('utf8')
            .split('\n')
            .filter((line) => line.trim().startsWith('data: '))

        for (const line of lines) {
            const message = line.replace(/^data: /, '')
            if (message === '[DONE]') {
                return
            }

            const json = JSON.parse(message)
            const token = json.choices[0].delta.content
            if (token) {
                yield token
            }
        }
    }
}

You can see this impl in my Telegram bot: https://github.com/danneu/telegram-chatgpt-bot/blob/24b76f880094b87a5c0a9a42c3571bbecfb12caa/openai.ts#L25

Or, instead of returning an async iterator, you just replace yield token with stream.push(token) or onToken(token) or whatever makes most sense for your app (remembering to change function* back to function).

drorm commented 1 year ago

I released https://github.com/drorm/gish which has a full example of streaming. If you just want to see it in action, just click on the screencast in the README. The code is at https://github.com/drorm/gish/blob/c88d39fcc97150d3107cf84eca3051fa3b18cd14/src/LLM.ts#L103 , based on a lot of the info in here. Thank you. My license is so steal at your pleasure, MIT :-).

Christopher-Hayes commented 1 year ago

@danneu great solution, thank you for sharing.

justinmahar commented 1 year ago
    for await (const chunk of response.data) {

@danneu @Christopher-Hayes VSCode complains with Type 'CreateChatCompletionResponse' must have a '[Symbol.asyncIterator]()' method that returns an async iterator. on this line -- is there something else you guys are doing to ensure you're getting an async iterator? Hacking it with an any typecast doesn't magically fix things either.

Christopher-Hayes commented 1 year ago
    for await (const chunk of response.data) {

@danneu @Christopher-Hayes VSCode complains with Type 'CreateChatCompletionResponse' must have a '[Symbol.asyncIterator]()' method that returns an async iterator. on this line -- is there something else you guys are doing to ensure you're getting an async iterator? Hacking it with an any typecast doesn't magically fix things either.

Yeah, idk why TS was giving that error. But, I just casted to unknown and then to the async iterator type it wants.

I used the code here if you want an example implementation: https://github.com/Christopher-Hayes/vscode-chatgpt-reborn/blob/3c191f34b52e1473a171531da42177857f15304c/src/api-provider.ts#L42

lassegit commented 1 year ago

Does there exists a way to safely JSON.parse the response and ensure that you get all the content provided by OpenAI? The methods above all fails in one way or another which is particularly problematic when generating code snippets but also markdown.

josephrocca commented 1 year ago

I'm using code that's roughly like this:

let response = await fetch("https://api.openai.com/v1/engines/davinci/completions",
  {
    headers: {
      "Content-Type": "application/json",
      Authorization: "Bearer " + OPENAI_KEY,
    },
    method: "POST",
    body: JSON.stringify({
      prompt: selected,
      temperature: 0.75,
      top_p: 0.95,
      max_tokens: 10,
      stream: true,
      stop: ["\n\n"],
    }),
  }
);

const reader = es.body?.pipeThrough(new TextDecoderStream()).getReader();

while (true) {
  const res = await reader?.read();
  if (res?.done) break;
  console.log(res?.value);
}

Look at the console.log output to see the format - you have to trim the data: part off, and then JSON.parse() it.

YoseptF commented 1 year ago

I've no idea why none of the anwers posted before worked for me, what ended up working is something similar to what https://github.com/openai/openai-node/issues/18#issuecomment-1352010682 said.

So in case there's someone else still looking for another options, here's what ended up working for me:

 const chat= async (
    prompt: string,
    previousChats: IpreviousChats[],
  ) => {
    const apiKey = window.localStorage.getItem(LOCAL_STORAGE_KEY);
    const url = "https://api.openai.com/v1/chat/completions";

    const xhr = new XMLHttpRequest();
    xhr.open("POST", url);
    xhr.setRequestHeader("Content-Type", "application/json");
    xhr.setRequestHeader("Authorization", "Bearer " + apiKey);

    xhr.onprogress = function(event) {
      console.log("Received " + event.loaded + " bytes of data.");
      console.log("Data: " + xhr.responseText);
      const newUpdates = xhr.responseText
      .replace("data: [DONE]", "")
      .trim()
      .split('data: ')
      .filter(Boolean)

      const newUpdatesParsed = newUpdates.map((update) => {
        const parsed = JSON.parse(update);
        return parsed.choices[0].delta?.content || '';
      }
      );

      const newUpdatesJoined = newUpdatesParsed.join('')
      console.log('current message so far',newUpdatesJoined);
    };

    xhr.onreadystatechange = function() {
      if (xhr.readyState === 4) {
        if (xhr.status === 200) {
          console.log("Response complete.");
          console.log("Final data: " + xhr.responseText);
        } else {
          console.error("Request failed with status " + xhr.status);
        }
      }
    };

    const data = JSON.stringify({
      model: currentChatModelRef.current,
      messages: [
        ...previousChats,
        {
          role: "user",
          content: prompt,
        }],
      temperature: 0.5,
      frequency_penalty: 0,
      presence_penalty: 0,
      stream: true,
    });

    xhr.send(data);
  }

https://user-images.githubusercontent.com/44252641/228965639-541e8917-d0c1-4b4b-8a63-d8bd216db272.mp4


good luck everyone :DDD

lassegit commented 1 year ago

@josephrocca Still sometimes getting the JSON.parse error in production using Vercel Edge Runtime.

edelauna commented 1 year ago

For server-side typescript can try:

const response = await openai.createChatCompletion({
    model: "gpt-3.5-turbo",
    messages: messages,
    stream: true,
}, { responseType: 'stream' });

const stream = response.data as unknown as IncomingMessage

stream.on('data', (chunk: Buffer) => {
   // Messages in the event stream are separated by a pair of newline characters.
   const payloads = chunk.toString().split("\n\n")
   for (const payload of payloads) {
       if (payload.includes('[DONE]')) return;
       if (payload.startsWith("data:")) {
           const data = payload.replaceAll(/(\n)?^data:\s*/g, ''); // in case there's multiline data event
           try {
               const delta = JSON.parse(data.trim())
               console.log(delta.choices[0].delta?.content)
           } catch (error) {
               console.log(`Error with JSON.parse and ${payload}.\n${error}`)
           }
       }
   }
})

stream.on('end', () => console.log('Stream done'))
stream.on('error', (e: Error) => console.error(e))
devilyouwei commented 1 year ago

This is my approach:

Create util.ts

// import what you need in util.ts
import {
    OpenAIApi,
    Configuration,
    ChatCompletionRequestMessage,
    CreateChatCompletionResponse,
} from 'openai'

// config openAI sdk
const openai = new OpenAIApi(
    new Configuration({
        apiKey: process.env.OPENAI_API_KEY,
        basePath: process.env.OPENAI_PROXY
    })
)

// write a util function, chat
export default {
    async chat(messages: ChatCompletionRequestMessage[], stream: boolean = false) {
        const responseType: ResponseType = stream ? 'stream' : 'json'
        return (
            await openai.createChatCompletion(
                {
                    model: 'gpt-3.5-turbo',
                    messages,
                    stream
                },
                { responseType }
            )
        ).data
    }
}

Install json detector

yarn add @stdlib/assert-is-json

Use util.ts in a controller or service ware

import openai from './openai'
import isJSON from '@stdlib/assert-is-json'
import { IncomingMessage } from 'http'

async chatStream(content: string, callback: CreateChatCompletionStreamResponseCallback) {
        const role = ChatCompletionRequestMessageRoleEnum.User
        // Transfer to IncomingMessage type, this is a Stream type
        const res = ((await openai.chat([{ role, content }], true)) as any) as IncomingMessage
        let tmp = '' // cache, store temporary string data
        res.on('data', (data: Buffer) => {
            // buffer to utf8 string, then split to string data array
            const message = data
                .toString('utf8')
                .split('\n')
                .filter(m => m.length > 0)
            for (const item of message) {
                // remove the first head word: 'data: '
                tmp += item.replace(/^data: /, '')
                // only when tmp string is a json, you transfer to object and callback it
                if (isJSON(tmp)) {
                    const data: CreateChatCompletionStreamResponse = JSON.parse(tmp)
                    tmp = ''
                    callback(data)
                }
            }
        })
    }

Interface file for openAI, Interface.ts

interface CreateChatCompletionStreamResponse {
    id: string
    object: string
    created: number
    model: string
    choices: Array<CreateChatCompletionStreamResponseChoicesInner>
}

interface CreateChatCompletionStreamResponseChoicesInner {
    delta: { role?: string; content?: string }
    index: number
    finish_reason: string
}

type CreateChatCompletionStreamResponseCallback = (response: CreateChatCompletionStreamResponse) => void
wrsulliv commented 1 year ago

Web Client Streaming

I was trying to access the OpenAI API from a web-client, and none of the solutions worked except @YoseptF - https://github.com/openai/openai-node/issues/18#issuecomment-1490970390

I believe it has to do with the client-side Axios implementation, but I may be wrong. Either way, thanks @YoseptF !

frankgreco commented 1 year ago

Here's my contribution ๐Ÿ‘‡๐Ÿผ

// Initiate the stream request.
const response = await fetch(/* <url> */, {
  method: 'POST',
  headers: { /* <headers> */ },
  body: JSON.stringify({
    ...{ stream: true },
    ...restOfBody
  })
});

if (!response?.body) {
  throw new Error('The response from OpenAI does not contain a body.');
}

// OpenAI responses seem to always begin with two newlines. We'll ignore those.
let noise = true;

// @ts-ignore
for await (const message of response.body) {
  for (const chunk of message.toString().split('\n\n')) {

    // https://github.com/openai/openai-node/issues/18#issuecomment-1369996933
    // https://github.com/openai/openai-node/issues/18#issuecomment-1493132878
    const msg: string = chunk.replace(/^data: /, '')

    // The stream is done?
    // https://platform.openai.com/docs/api-reference/chat/create
    if (msg === '[DONE]' || msg.length === 0) {
      continue;
    }

    let parsed: any;
    try {
      parsed = JSON.parse(msg.trim());
    } catch (e) {
      throw new Error(`Could not parse OpenAI response (${msg}).`);
    }

    let choice: string;
    try {
      choice = parsed?.choices?.[0]?.text;
    } catch (e) {
      throw new Error(`Could not dereference OpenAI message format (${parsed}).`);
    }

    if (noise && choice === '\n') {
      continue;
    }

    noise = false;
    // your final message will be here.
  }
}

NOTE: Sending { stop: ['\n\n'] } as part of the request did not work for me.

justinsteven commented 1 year ago

Using a pattern similar to that shown in https://github.com/openai/openai-node/issues/18#issuecomment-1493132878 in a client-side React app I'm getting stream.on is not a function

I assume this is because Axios appears to not support { responseType: 'stream' } in the browser, only in server-side node

See https://stackoverflow.com/a/60117409

shreypjain commented 1 year ago

Yeah so @justinsteven I'm running into the same issue using client-side React just to build something simple out. Here is the best work around I could find but I'll show you the issue with it in just a second:

Would recommend adding this to a button click in React and have your state altered based on it:

await axios.post(
        "https://api.openai.com/v1/chat/completions",
        {
          messages: newMessages,
          stream: true,
          model: "gpt-3.5-turbo",
        },
        {
          headers: {
            Authorization: "Bearer " + getOpenAISK(),
          },
          onDownloadProgress: (event) => {
            const payload = event.currentTarget.response;

            const result = payload
              .replace(/data:\s*/g, "")
              .replace(/[\r\n\t]/g, "")
              .split("}{")
              .join("},{");
            const cleanedJsonString = `[${result}]`;

            if (payload.includes("[DONE]")) return;

            try {
              const parsedJson: [] = JSON.parse(cleanedJsonString);
              // console.log(JSON.stringify(parsedJson, null, 2));

              let newContent: string = ""
              parsedJson.forEach((item: any) => {
                if (
                  item.choices &&
                  item.choices.length > 0 &&
                  item.choices[0].delta &&
                  item.choices[0].delta.content
                ) {
                  newContent += item.choices[0].delta.content;
                }
              });

              if (newContent !== laggingLatestMessage) {
                const extraContent = newContent.slice(
                  laggingLatestMessage.length
                );
                laggingLatestMessage += extraContent;
                setLatestMessage(laggingLatestMessage);
              }
            } catch (e) {
              setIsGenerating(false);
              console.log("error parsing json", e);
            }
          },
          responseType: "stream",
        }
      );

Unfortunately, sometimes the output will come ugly looking like this, but I believe this is the best work around when using axios stream in client side React.

image

justinmahar commented 1 year ago

๐Ÿ“ฆ Client and server side streaming solution via npm

Hey everyone! After some tinkering, I've created a working client and server side solution for this. The GitHub project is here, and you can try the client/browser demo here. (Demo is in React but solution is framework agnostic)

You can now drop in support for streaming chat completions in both the server (Node.js) and client (browser) via the npm package openai-ext. This solution was inspired by everyone's work above, especially @YoseptF and @edelauna. Thanks everyone for working on this together.

This solution supports stopping completions, too.

Full usage examples below.

demo


To install via npm:

npm i openai-ext@latest

Browser / Client

๐Ÿ‘๏ธ View live demo

Use the following solution in a browser environment:

import { OpenAIExt } from "openai-ext";

// Configure the stream (use type ClientStreamChatCompletionConfig for TypeScript users)
const streamConfig = {
  apiKey: `123abcXYZasdf`, // Your API key
  handler: {
    // Content contains the string draft, which may be partial. When isFinal is true, the completion is done.
    onContent(content, isFinal, xhr) {
      console.log(content, "isFinal?", isFinal);
    },
    onDone(xhr) {
      console.log("Done!");
    },
    onError(error, status, xhr) {
      console.error(error);
    },
  },
};

// Make the call and store a reference to the XMLHttpRequest
const xhr = OpenAIExt.streamClientChatCompletion(
  {
    model: "gpt-3.5-turbo",
    messages: [
      { role: "system", content: "You are a helpful assistant." },
      { role: "user", content: "Tell me a funny joke." },
    ],
  },
  streamConfig
);
// If you'd like to stop the completion, call xhr.abort(). The onDone() handler will be called.
xhr.abort();

Node.js / Server

Use the following solution in a Node.js or server environment:

import { Configuration, OpenAIApi } from 'openai';
import { OpenAIExt } from "openai-ext";

const apiKey = `123abcXYZasdf`; // Your API key
const configuration = new Configuration({ apiKey });
const openai = new OpenAIApi(configuration);

// Configure the stream (use type ServerStreamChatCompletionConfig for TypeScript users)
const streamConfig = {
  openai: openai,
  handler: {
    // Content contains the string draft, which may be partial. When isFinal is true, the completion is done.
    onContent(content, isFinal, stream) {
      console.log(content, "isFinal?", isFinal);
    },
    onDone(stream) {
      console.log('Done!');
    },
    onError(error, stream) {
      console.error(error);
    },
  },
};

const axiosConfig = {
  // ...
};

// Make the call to stream the completion
OpenAIExt.streamServerChatCompletion(
  {
    model: 'gpt-3.5-turbo',
    messages: [
      { role: 'system', content: 'You are a helpful assistant.' },
      { role: 'user', content: 'Tell me a funny joke.' },
    ],
  },
  streamConfig,
  axiosConfig
);

If you'd like to stop the completion, call stream.destroy(). The onDone() handler will be called.

const response = await OpenAIExt.streamServerChatCompletion(...);
const stream = response.data;
stream.destroy();

You can also stop completion using an Axios cancellation in the Axios config.


Let me know if you have any suggestions. PRs always welcome! I'll be adding a live demo to the project's Storybook site soon.

Update: I've added support for Node.js/server chat completion streaming. This library now supports both server and client. Woohoo ๐Ÿ˜Ž

Update 2: Live React demo now available -- view here

arthcmr commented 1 year ago

Nice work @justinmahar your solution is super easy to use and works great with Next.js and React apps ๐Ÿ’ฏ

jaankoppe commented 1 year ago

How to implement this thing so that the OpenAI API request will be made from the backend because I do not want to expose the API key to the client side?

shreypjain commented 1 year ago

Hey @jaankoppe, you unfortunately wouldn't be able to use this server side (this is more so a client side solution to play around with a local chatGPT bot). I would recommend using the solutions above, as they should all work with axios and server side chat completions if needed it. Use this solution as a reference:

https://github.com/openai/openai-node/issues/18#issuecomment-1493132878

justinmahar commented 1 year ago

@jaankoppe @shreypjain I've updated the solution to support both server and client streaming. Give it a shot and let us know how it works for you - https://github.com/openai/openai-node/issues/18#issuecomment-1509225450

Yafaa commented 1 year ago

I am running those examples but keep getting : TypeError: completion.data.on is not a function Tried update my node version to the latest but same.

justinmahar commented 1 year ago

@Yafaa Which environment are you in (node.js or browser)? Are you using the correct call for the environment?

The latest version will throw an error when used in the wrong environment. Try it out -- npm i openai-ext@latest

Yafaa commented 1 year ago

for the example with openai-ext it's fine but with this https://github.com/openai/openai-node/issues/18#issuecomment-1369996933 and https://github.com/openai/openai-node/issues/18#issuecomment-1371279689 it throws TypeError: completion.data.on is not a function

lgh06 commented 1 year ago

I used https://github.com/PawanOsman/ChatGPT/blob/b705a2511b71cf2a6077db76a7048ddbca1ecbb1/routes.js#L178 solution on Next.js API side, it worked.

UPDATE: Browser frontend Joseph's solution worked for me

MDN NOT WORK: And I am still working on React side following this MDN guide and this

higher level wrapper sucks, Node.js & browser native functions are winner in dealing with OpenAI's stream scenario

willguest commented 1 year ago

i finally managed to get the stream working on chrome, but not firefox. The route i picked yields tokens with the async iterator and write them as the body of the response, and that is picked up as a readable stream.

index.js

setResult('');
  const response = await fetch("/api/generate", {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
    },
    body: JSON.stringify({ 
      querySystem: islandSystemInput, 
      queryUser: islandUserInput 
    }),
  });

  const reader = response.body?.pipeThrough(new TextDecoderStream()).getReader();

  while (true) {
    const res = await reader?.read();
    if (res?.value?.toString() !== undefined){
      setResult(result => result + res?.value);
    }
    if (res?.done) break; 
  }
}

generate.js

async function* streamChatCompletion(messages) {
  const completion = await openai.createCompletion(
      {
          model: 'gpt-4-0314',
          messages: messages,
          max_tokens: 10,
          stream: true,
          stop: ["\n\n"],
      },
      {
          responseType: 'stream',
      },
  )

  for await (const chunk of completion.data) {
      const lines = chunk
          .toString('utf8')
          .split('\n')
          .filter((line) => line.trim().startsWith('data: '))

      for (const line of lines) {
          const message = line.replace(/^data: /, '')
          if (message === '[DONE]') {
              return
          }
          const json = JSON.parse(message)
          const token = json.choices[0].delta.content
          if (token) {
            yield token;
          }
      }
  }
}

my generate.js is mostly vanilla otherwise, just calling the above function with a for await loop

I hope this is helpful for someone.

fukemy commented 1 year ago
streamChatCompletions

Hi, I got error:

Invalid attempt to iterate non-iterable instance.
In order to be iterable, non-array objects must have a [Symbol.iterator]() method

Can u help?

atonamy commented 1 year ago

i finally managed to get the stream working on chrome, but not firefox. The route i picked yields tokens with the async iterator and write them as the body of the response, and that is picked up as a readable stream.

index.js

setResult('');
  const response = await fetch("/api/generate", {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
    },
    body: JSON.stringify({ 
      querySystem: islandSystemInput, 
      queryUser: islandUserInput 
    }),
  });

  const reader = response.body?.pipeThrough(new TextDecoderStream()).getReader();

  while (true) {
    const res = await reader?.read();
    if (res?.value?.toString() !== undefined){
      setResult(result => result + res?.value);
    }
    if (res?.done) break; 
  }
}

generate.js

async function* streamChatCompletion(messages) {
  const completion = await openai.createCompletion(
      {
          model: 'gpt-4-0314',
          messages: messages,
          max_tokens: 10,
          stream: true,
          stop: ["\n\n"],
      },
      {
          responseType: 'stream',
      },
  )

  for await (const chunk of completion.data) {
      const lines = chunk
          .toString('utf8')
          .split('\n')
          .filter((line) => line.trim().startsWith('data: '))

      for (const line of lines) {
          const message = line.replace(/^data: /, '')
          if (message === '[DONE]') {
              return
          }
          const json = JSON.parse(message)
          const token = json.choices[0].delta.content
          if (token) {
            yield token;
          }
      }
  }
}

my generate.js is mostly vanilla otherwise, just calling the above function with a for await loop

I hope this is helpful for someone.

It will work in Firefox if u set Content-Type header in the response of /api/generate endpoint this is know issue still didn't fix.

MentalGear commented 1 year ago

OpenAI Streams Library

There's a new node.js lib specific for streams, which should render this problem solved. (Make sure to check them out, and maybe make a PR regarding whisper.)

https://github.com/SpellcraftAI/openai-streams

MatchuPitchu commented 1 year ago

I've developed a custom hook in React with TypeScript to stream data from the OpenAI API. This solution is based on several approaches listed here in the comments. https://github.com/MatchuPitchu/open-ai

The custom hook allows streaming API responses, as well as canceling the response stream and resetting the messages context. Additionally, the project supports code syntax highlighting and the ability to copy code snippets to the clipboard, as well as displaying additional meta data for each response.

import { useCallback, useState } from 'react';
import type { DeepRequired } from '@/utils/type-helpers';

export type GPT35 = 'gpt-3.5-turbo' | 'gpt-3.5-turbo-0301';
export type GPT4 = 'gpt-4' | 'gpt-4-0314' | 'gpt-4-32k' | 'gpt-4-32k-0314';
export type Model = GPT35 | GPT4;

export type ChatRole = 'user' | 'assistant' | 'system' | '';

export type ChatCompletionResponseMessage = {
  content: string; // content of the completion
  role: ChatRole; // role of the person/AI in the message
};

export type ChatMessageToken = ChatCompletionResponseMessage & {
  timestamp: number;
};

export type ChatMessageParams = ChatCompletionResponseMessage & {
  timestamp?: number; // timestamp of completed request
  meta?: {
    loading?: boolean; // completion state
    responseTime?: string; // total elapsed time between completion start and end
    chunks?: ChatMessageToken[]; // returned chunks of completion stream
  };
};

export type ChatMessage = DeepRequired<ChatMessageParams>;

export type ChatCompletionChunk = {
  id: string;
  object: string;
  created: number;
  model: Model;
  choices: {
    delta: Partial<ChatCompletionResponseMessage>;
    index: number;
    finish_reason: string | null;
  }[];
};

type RequestOptions = {
  headers: Record<string, string>;
  method: 'POST';
  body: string;
  signal: AbortSignal;
};

export type OpenAIStreamingProps = {
  apiKey: string;
  model: Model;
};

const OPENAI_COMPLETIONS_URL = 'https://api.openai.com/v1/chat/completions';
const MILLISECONDS_PER_SECOND = 1000;

const updateLastItem = <T>(currentItems: T[], updatedLastItem: T) => {
  const newItems = currentItems.slice(0, -1);
  newItems.push(updatedLastItem);
  return newItems;
};

// transform chat message structure with metadata to a limited shape that OpenAI API expects
const getOpenAIRequestMessage = ({ content, role }: ChatMessage): ChatCompletionResponseMessage => ({
  content,
  role
});

const getOpenAIRequestOptions = (
  apiKey: string,
  model: Model,
  messages: ChatMessage[],
  signal: AbortSignal
): RequestOptions => ({
  headers: {
    'Content-Type': 'application/json',
    Authorization: `Bearer ${apiKey}`
  },
  method: 'POST',
  body: JSON.stringify({
    model,
    messages: messages.map(getOpenAIRequestMessage),
    // TODO: define value: max_tokens: 100,
    stream: true
  }),
  signal
});

// transform chat message into a chat message with metadata
const createChatMessage = ({ content, role, meta }: ChatMessageParams): ChatMessage => ({
  content,
  role,
  timestamp: Date.now(),
  meta: {
    loading: false,
    responseTime: '',
    chunks: [],
    ...meta
  }
});

export const useOpenAIChatStream = ({ model, apiKey }: OpenAIStreamingProps) => {
  const [messages, setMessages] = useState<ChatMessage[]>([]);
  const [controller, setController] = useState<AbortController | null>(null);
  const [isLoading, setIsLoading] = useState<boolean>(false);

  const resetMessages = () => setMessages([]);

  const abortStream = () => {
    // abort fetch request by calling abort() on the AbortController instance
    if (!controller) return;
    controller.abort();
    setController(null);
  };

  const closeStream = (startTimestamp: number) => {
    // determine the final timestamp, and calculate the number of seconds the full request took.
    const endTimestamp = Date.now();
    const differenceInSeconds = (endTimestamp - startTimestamp) / MILLISECONDS_PER_SECOND;
    const formattedDiff = `${differenceInSeconds.toFixed(2)}s`;

    // update last entry of message list with the final details
    setMessages((prevMessages) => {
      const lastMessage = prevMessages.at(-1);
      if (!lastMessage) return [];

      const updatedLastMessage = {
        ...lastMessage,
        timestamp: endTimestamp,
        meta: {
          ...lastMessage.meta,
          loading: false,
          responseTime: formattedDiff
        }
      };

      return updateLastItem(prevMessages, updatedLastMessage);
    });
  };

  const submitPrompt = useCallback(
    async (newPrompt: ChatMessageParams[]) => {
      // a) no new request if last stream is loading
      // b) no request if empty string as prompt
      if (isLoading || !newPrompt[0].content) return;

      setIsLoading(true);

      const startTimestamp = Date.now();
      const chatMessages: ChatMessage[] = [...messages, ...newPrompt.map(createChatMessage)];

      const newController = new AbortController();
      const signal = newController.signal;
      setController(newController);

      try {
        const response = await fetch(
          OPENAI_COMPLETIONS_URL,
          getOpenAIRequestOptions(apiKey, model, chatMessages, signal)
        );

        if (!response.body) return;
        // read response as data stream
        const reader = response.body.getReader();
        const decoder = new TextDecoder('utf-8');

        // placeholder for next message that will be returned from API
        const placeholderMessage = createChatMessage({ content: '', role: '', meta: { loading: true } });
        let currentMessages = [...chatMessages, placeholderMessage];

        // eslint-disable-next-line no-constant-condition
        while (true) {
          const { done, value } = await reader.read();
          if (done) {
            closeStream(startTimestamp);
            break;
          }
          // parse chunk of data
          const chunk = decoder.decode(value);
          const lines = chunk.split(/(\n){2}/);

          const parsedLines: ChatCompletionChunk[] = lines
            .map((line) => line.replace(/(\n)?^data:\s*/, '').trim()) // remove 'data:' prefix
            .filter((line) => line !== '' && line !== '[DONE]') // remove empty lines and "[DONE]"
            .map((line) => JSON.parse(line)); // parse JSON string

          for (const parsedLine of parsedLines) {
            let chunkContent: string = parsedLine.choices[0].delta.content ?? '';
            chunkContent = chunkContent.replace(/^`\s*/, '`'); // avoid empty line after single backtick
            const chunkRole: ChatRole = parsedLine.choices[0].delta.role ?? '';

            // update last message entry in list with the most recent chunk
            const lastMessage = currentMessages.at(-1);
            if (!lastMessage) return;

            const updatedLastMessage = {
              content: `${lastMessage.content}${chunkContent}`,
              role: `${lastMessage.role}${chunkRole}` as ChatRole,
              timestamp: 0,
              meta: {
                ...lastMessage.meta,
                chunks: [
                  ...lastMessage.meta.chunks,
                  {
                    content: chunkContent,
                    role: chunkRole,
                    timestamp: Date.now()
                  }
                ]
              }
            };

            currentMessages = updateLastItem(currentMessages, updatedLastMessage);
            setMessages(currentMessages);
          }
        }
      } catch (error) {
        if (signal.aborted) {
          console.error(`Request aborted`, error);
        } else {
          console.error(`Error during chat response streaming`, error);
        }
      } finally {
        setController(null); // reset AbortController
        setIsLoading(false);
      }
    },
    [apiKey, isLoading, messages, model]
  );

  return { messages, submitPrompt, resetMessages, isLoading, abortStream };
};
zachariahtimothy commented 1 year ago

Inspired by @YoseptF I was able to accomplish the same using just the openai SDK. This utilized Typescript, React, and Zustand.

type ConversationMessage = CreateChatCompletionRequest["messages"][0] & {
  id?: string;
};

export interface ChatCompletionSlice {
  conversationMessages: ConversationMessage[];
  createChatCompletion: (
    request: Omit<CreateChatCompletionRequest, "model">
  ) => Promise<void>;
  resetConversation: () => void;
}

export const createChatCompletionSlice: StateCreator<
  MainSlice & ChatCompletionSlice,
  [],
  [],
  ChatCompletionSlice
> = (set, get) => ({
  conversationMessages: [],
  createChatCompletion: async (request) => {
    const { messages: requestMessages, stream, ...restRequest } = request;
    // Add users message in
    set({
      conversationMessages: get().conversationMessages.concat(requestMessages),
    });
    const messages = get().conversationMessages.map(
      ({ id, ...restMessage }) => restMessage
    );
    const requestData: CreateChatCompletionRequest = {
      messages,
      stream,
      ...restRequest,
      model: get().selectedModelId,
    };

    if (stream) {
      openAi.createChatCompletion(requestData, {
        onDownloadProgress(event: ProgressEvent) {
          const target = event.target as XMLHttpRequest;
          const newUpdates = target.responseText
            .replace("data: [DONE]", "")
            .trim()
            .split("data: ")
            .filter(Boolean);
          let id = "";
          const newUpdatesParsed: string[] = newUpdates.map((update) => {
            const parsed = JSON.parse(update);
            id = parsed.id;
            return parsed.choices[0].delta?.content || "";
          });
          const newUpdatesJoined = newUpdatesParsed.join("");
          const existingMessages = get().conversationMessages.map((x) => x);
          const existingMessageIndex = existingMessages.findLastIndex(
            (x) => x.role === "assistant" && x.id === id
          );

          if (existingMessageIndex !== -1) {
            existingMessages[existingMessageIndex].content = newUpdatesJoined;
            set({
              conversationMessages: existingMessages,
            });
          } else {
            set({
              conversationMessages: existingMessages.concat([
                {
                  role: "assistant",
                  content: newUpdatesJoined,
                  id,
                },
              ]),
            });
          }
        },
      });
    } else {
      const response = await openAi.createChatCompletion(requestData);

      if (response.data) {
        const newMessages: CreateChatCompletionRequest["messages"] =
          response.data.choices
            .filter((x) => x.message !== undefined)
            .map((x) => ({
              role: x.message!.role,
              content: x.message!.content,
            }));
        set({
          conversationMessages: get().conversationMessages.concat(newMessages),
        });
      }
    }
  },
  resetConversation: () => {
    set({ conversationMessages: [] });
  },
});
rodrigoGA commented 1 year ago

@gfortaine we actually use @microsoft/fetch-event-source for the playground to do streaming with POST +1

Thank you all for sharing your solutions here! I agree that @smervs solution currently looks like the best option available for the openai-node package. Here's a more complete example with proper error handling and no extra dependencies:

try {
    const res = await openai.createCompletion({
        model: "text-davinci-002",
        prompt: "It was the best of times",
        max_tokens: 100,
        temperature: 0,
        stream: true,
    }, { responseType: 'stream' });

    res.data.on('data', data => {
        const lines = data.toString().split('\n').filter(line => line.trim() !== '');
        for (const line of lines) {
            const message = line.replace(/^data: /, '');
            if (message === '[DONE]') {
                return; // Stream finished
            }
            try {
                const parsed = JSON.parse(message);
                console.log(parsed.choices[0].text);
            } catch(error) {
                console.error('Could not JSON parse stream message', message, error);
            }
        }
    });
} catch (error) {
    if (error.response?.status) {
        console.error(error.response.status, error.message);
        error.response.data.on('data', data => {
            const message = data.toString();
            try {
                const parsed = JSON.parse(message);
                console.error('An error occurred during OpenAI request: ', parsed);
            } catch(error) {
                console.error('An error occurred during OpenAI request: ', message);
            }
        });
    } else {
        console.error('An error occurred during OpenAI request', error);
    }
}

This could probably be refactored into a streamCompletion helper function (that uses either callbacks or es6 generators to emit new messages).

Apologies there's not an easier way to do this within the SDK itself โ€“ the team will continue evaluating how to get this added natively, despite the lack of support in the current sdk generator tool we're using.

lack

Thank you @schnerd . I don't quite understand the limitations of this solution. It uses @microsoft/fetch-event-source which, as far as I understand, encodes the information in the URL, and it's not doing a POST. The library page mentions a limitation of 2000 characters in most browsers, but it's not clear to me whether this also affects Node.js. I appreciate any clarification on the limitation of this solution.

juzarantri commented 1 year ago

@rodrigoGA Can you tell the npm package for openai

ckarsan commented 1 year ago

@justinmahar This is a great solution thanks - I'm testing it on node, however it seems to strip out the triple backticks ``` for when code is generated by the AI response, any way to keep those in by any chance as I use them to format the code. Thanks !

juzarantri commented 1 year ago

@ckarsan what do we need to pass in const axiosConfig = { // ... }; ?

ckarsan commented 1 year ago

@juzarantri Think it's just optional parameters i left it out entirely

juzarantri commented 1 year ago

okay bro