openai / openai-node

The official Node.js / Typescript library for the OpenAI API
https://www.npmjs.com/package/openai
Apache License 2.0
7.82k stars 845 forks source link

How to stream Text to Speech? #487

Closed user080975 closed 10 months ago

user080975 commented 11 months ago

According to the documentation here for Text to Speech: https://platform.openai.com/docs/guides/text-to-speech?lang=node

There is the possibility of streaming audio without waiting for the full file to buffer. But the example is a Python one. Is there any possibility of streaming the incoming audio using Node JS?

abhishekgoyal1 commented 11 months ago

This needs to be added. I've tried all different methods but I don't think it's supported natively in the node SDK at all at the moment.

This does return a streamable object but there are no chunks found while iterating through it:

const stream = await openai.audio.speech.create( { model: 'tts-1', voice: 'alloy', input: textData, response_format: 'opus', }, { stream: true }, );

rattrayalex commented 11 months ago

Yes, this works today – I'm sorry that the example code doesn't reflect that.

You can simply access response.body which is a readable stream (in web, a true ReadableStream and in Node, a Readable), like so:

async function main() {
  const response = await openai.audio.speech.create({
    model: 'tts-1',
    voice: 'alloy',
    input: 'the quick brown fox jumped over the lazy dogs',
  });

  const stream = response.body;
}

I'll try to update the example soon, and won't close this issue until I do. Feel free to share use-cases you'd like to see in the example here, with sample code.

c121914yu commented 11 months ago
export async function text2Speech({
  res,
  onSuccess,
  onError,
  model = defaultAudioSpeechModels[0].model,
  voice = Text2SpeechVoiceEnum.alloy,
  input,
  speed = 1
}: {
  res: NextApiResponse;
  onSuccess: (e: { model: string; buffer: Buffer }) => void;
  onError: (e: any) => void;
  model?: string;
  voice?: `${Text2SpeechVoiceEnum}`;
  input: string;
  speed?: number;
}) {
  const ai = getAIApi();
  const response = await ai.audio.speech.create({
    model,
    voice,
    input,
    response_format: 'mp3',
    speed
  });

  const readableStream = response.body as unknown as NodeJS.ReadableStream;
  readableStream.pipe(res);

  let bufferStore = Buffer.from([]);

  readableStream.on('data', (chunk) => {
    bufferStore = Buffer.concat([bufferStore, chunk]);
  });
  readableStream.on('end', () => {
    onSuccess({ model, buffer: bufferStore });
  });
  readableStream.on('error', (e) => {
    onError(e);
  });
}

This is my example, it is a nextjs framework. I hope that will be helpful.

juhana commented 11 months ago

Note that the Typescript types aren't correct when reading the response as a stream in Node. You have to do const stream = response.body as unknown as Readable; for it to not throw type errors.

rattrayalex commented 11 months ago

To fix those type errors, add import 'openai/shims/node' to the top of your file (details here) if you're on Node, or import 'openai/shims/web' if you're on anything else.

We're working to improve this.

PetersonFonseca commented 10 months ago

Hello everyone, can you please help me implement it on node? I can't make it work...

`import path from "path"; import OpenAI from "openai";

const openai = new OpenAI({ apiKey: process.env.OPENAI_SECRET_KEY, });

const response = openai.audio.speech.create({ model: "tts-1", voice: "onyx", input: "Teste de texto para fala.", });

response.stream_to_file(path.resolve("./speech.mp3"));`

rattrayalex commented 10 months ago

We don't provide a stream_to_file method; instead, use response.body.pipe(fs.createWriteStream(myPath)). Here's a complete example:

import OpenAI from 'openai';
import fs from 'fs';
import path from 'path';

// gets API Key from environment variable OPENAI_API_KEY
const openai = new OpenAI();

const speechFile = path.resolve(__dirname, './speech.mp3');

async function streamToFile(stream: NodeJS.ReadableStream, path: fs.PathLike) {
  return new Promise((resolve, reject) => {
    const writeStream = fs.createWriteStream(path).on('error', reject).on('finish', resolve);

    stream.pipe(writeStream).on('error', (error) => {
      writeStream.close();
      reject(error);
    });
  });
}

async function main() {
  const mp3 = await openai.audio.speech.create({
    model: 'tts-1',
    voice: 'alloy',
    input: 'the quick brown chicken jumped over the lazy dogs',
  });

  await streamToFile(mp3.body, speechFile);
}
main();
yaozijun commented 10 months ago
async function streamToFile(stream, path) {
    return new Promise((resolve, reject) => {
        const writeStream = fs.createWriteStream(path)
            .on('error', reject)
            .on('finish', resolve);

        stream.pipe(writeStream)
            .on('error', (error) => {
                writeStream.close();
                reject(error);
            });
    });
}
const ret= await openai.audio.speech.create({
                model: "tts-1",
                voice: "onyx",
                input: "test",
            });
const stream = ret.body;
const speechFile = path.resolve(`/xxx/test.mp3`);
await streamToFile(stream, speechFile);
karar-shah commented 9 months ago
text2Speech

@c121914yu Could you kindly assist me in playing the audio stream on the client side? Thank you.

karar-shah commented 9 months ago

We don't provide a stream_to_file method; instead, use response.body.pipe(fs.createWriteStream(myPath)). Here's a complete example:

import OpenAI from 'openai';
import fs from 'fs';
import path from 'path';

// gets API Key from environment variable OPENAI_API_KEY
const openai = new OpenAI();

const speechFile = path.resolve(__dirname, './speech.mp3');

async function streamToFile(stream: NodeJS.ReadableStream, path: fs.PathLike) {
  return new Promise((resolve, reject) => {
    const writeStream = fs.createWriteStream(path).on('error', reject).on('finish', resolve);

    stream.pipe(writeStream).on('error', (error) => {
      writeStream.close();
      reject(error);
    });
  });
}

async function main() {
  const mp3 = await openai.audio.speech.create({
    model: 'tts-1',
    voice: 'alloy',
    input: 'the quick brown chicken jumped over the lazy dogs',
  });

  await streamToFile(mp3.body, speechFile);
}
main();

@rattrayalex , I would like to inquire about the process of streaming the audio response on the Client component in Next.js. Despite searching for the past 1 day, I have been unable to find a solution. Thank you so much for your help.

c121914yu commented 9 months ago
text2Speech

@c121914yu Could you kindly assist me in playing the audio stream on the client side? Thank you.

https://github.com/labring/FastGPT/blob/main/projects/app/src/web/common/utils/voice.ts

I haven't brought a computer with me recently, so I can't copy the code easily.

You can refer to my code for client streaming through fetch and MediaSource Api.

However, I have found that this api has some compatibility issues in apple products.

athrael-soju commented 9 months ago
text2Speech

@c121914yu Could you kindly assist me in playing the audio stream on the client side? Thank you.

https://github.com/labring/FastGPT/blob/main/projects/app/src/web/common/utils/voice.ts

I haven't brought a computer with me recently, so I can't copy the code easily.

You can refer to my code for client streaming through fetch and MediaSource Api.

However, I have found that this api has some compatibility issues in apple products.

You're gonna need polyfill for that

aleksa-codes commented 4 months ago
export async function text2Speech({
  res,
  onSuccess,
  onError,
  model = defaultAudioSpeechModels[0].model,
  voice = Text2SpeechVoiceEnum.alloy,
  input,
  speed = 1
}: {
  res: NextApiResponse;
  onSuccess: (e: { model: string; buffer: Buffer }) => void;
  onError: (e: any) => void;
  model?: string;
  voice?: `${Text2SpeechVoiceEnum}`;
  input: string;
  speed?: number;
}) {
  const ai = getAIApi();
  const response = await ai.audio.speech.create({
    model,
    voice,
    input,
    response_format: 'mp3',
    speed
  });

  const readableStream = response.body as unknown as NodeJS.ReadableStream;
  readableStream.pipe(res);

  let bufferStore = Buffer.from([]);

  readableStream.on('data', (chunk) => {
    bufferStore = Buffer.concat([bufferStore, chunk]);
  });
  readableStream.on('end', () => {
    onSuccess({ model, buffer: bufferStore });
  });
  readableStream.on('error', (e) => {
    onError(e);
  });
}

This is my example, it is a nextjs framework. I hope that will be helpful.

Can someone please help me. How do I use this or something similar to have an API route handler (endpoint) and call it from the frontend component in Next.js? I am basically trying to rebuild TTS fucntionallity that is in ChatGPT.

LeakedDave commented 4 months ago

Hey Aleksa, stumbled upon this because I'm building it myself. If you still need help... I re-wrote the above example as a simple API Route (pages router /api/voice.js)

import OpenAI from "openai";
const openai = new OpenAI({
    apiKey: process.env.OPENAI_API_KEY,
})

export default async function handler(req, res) {
    const { input } = req.query
    res.setHeader('Content-Type', 'audio/mpeg')

    const response = await openai.audio.speech.create({
        model: "tts-1",
        voice: "alloy",
        input: input,
        response_format: 'mp3',
        speed: 1
    })

    const readableStream = response.body
    readableStream.pipe(res)

    let bufferStore = Buffer.from([])

    readableStream.on('data', (chunk) => {
        bufferStore = Buffer.concat([bufferStore, chunk])
    })

    readableStream.on('end', () => {
        // Store the mp3 somewhere if you want to reuse it
        // onSuccess({ model, buffer: bufferStore });
    })

    readableStream.on('error', (e) => {
        console.error(e)
    })
}

To play it locally from your client side, simple:

const input = "Today is a wonderful day to build something people love!"
new Audio(`/api/voice?input=${input}`).play()
kifjj commented 2 weeks ago

Hi LeakedDave and Alexa,

I implemented the simple route api example nodejs/express and hosted in several environments, google firebase, google app engine and while the streaming works, I observed a strange thing, basically the audio starts playing after 6s to 8s.

I tried many things on the servers (increase memory, move to a closer region) but no luck.

Any idea?

LeakedDave commented 2 weeks ago

Hi LeakedDave and Alexa,

I implemented the simple route api example nodejs/express and hosted in several environments, google firebase, google app engine and while the streaming works, I observed a strange thing, basically the audio starts playing after 6s to 8s.

I tried many things on the servers (increase memory, move to a closer region) but no luck.

Any idea?

Honestly I’m not sure. If possible I would suggest to just host a NextJS API for this, I haven’t tested it with vanilla express at all. It sounds like your API doesn’t support streaming since 6-7 second wait would be the full audio I think.

kifjj commented 2 weeks ago

@LeakedDave oh, you are right Firebase functions and google app engine don't support streaming. Thanks for putting me on the right path. Looking around I see AWS Lambda introduced streaming support 1 year ago but with some limitation (API Gateway, ALB not supported).

I tried and it works, stream starts in 3s or when cold start 5s. Much better user experience.

Here is the code if someone needs

`/ global fetch / import util from 'util'; import stream from 'stream'; const { Readable } = stream; const pipeline = util.promisify(stream.pipeline);

/ global awslambda / export const handler = awslambda.streamifyResponse(async (event, responseStream, _context) => {

console.log("Query params" + event["queryStringParameters"]["text"]);

//console.log("event json: " + JSON.stringify(event));

const textToTTS = event["queryStringParameters"]["text"];

if (!textToTTS) { console.log("no text to translate sent [" + textToTTS + "]"); return; }

const rs = await fetch('https://api.openai.com/v1/audio/speech', { method: 'POST', headers: { Authorization: 'Bearer ' + okey, 'Content-Type': 'application/json', }, body: JSON.stringify({ input: textToTTS, model: 'tts-1', response_format: 'mp3', voice: 'echo', }), });

await pipeline(rs.body, responseStream); }); `