How to stream Text to Speech?

user080975 commented 11 months ago

According to the documentation here for Text to Speech: https://platform.openai.com/docs/guides/text-to-speech?lang=node

There is the possibility of streaming audio without waiting for the full file to buffer. But the example is a Python one. Is there any possibility of streaming the incoming audio using Node JS?

abhishekgoyal1 commented 11 months ago

This needs to be added. I've tried all different methods but I don't think it's supported natively in the node SDK at all at the moment.

This does return a streamable object but there are no chunks found while iterating through it:

const stream = await openai.audio.speech.create( { model: 'tts-1', voice: 'alloy', input: textData, response_format: 'opus', }, { stream: true }, );

rattrayalex commented 11 months ago

Yes, this works today – I'm sorry that the example code doesn't reflect that.

You can simply access response.body which is a readable stream (in web, a true ReadableStream and in Node, a Readable), like so:

async function main() {
  const response = await openai.audio.speech.create({
    model: 'tts-1',
    voice: 'alloy',
    input: 'the quick brown fox jumped over the lazy dogs',
  });

  const stream = response.body;
}

I'll try to update the example soon, and won't close this issue until I do. Feel free to share use-cases you'd like to see in the example here, with sample code.

c121914yu commented 11 months ago

export async function text2Speech({
  res,
  onSuccess,
  onError,
  model = defaultAudioSpeechModels[0].model,
  voice = Text2SpeechVoiceEnum.alloy,
  input,
  speed = 1
}: {
  res: NextApiResponse;
  onSuccess: (e: { model: string; buffer: Buffer }) => void;
  onError: (e: any) => void;
  model?: string;
  voice?: `${Text2SpeechVoiceEnum}`;
  input: string;
  speed?: number;
}) {
  const ai = getAIApi();
  const response = await ai.audio.speech.create({
    model,
    voice,
    input,
    response_format: 'mp3',
    speed
  });

  const readableStream = response.body as unknown as NodeJS.ReadableStream;
  readableStream.pipe(res);

  let bufferStore = Buffer.from([]);

  readableStream.on('data', (chunk) => {
    bufferStore = Buffer.concat([bufferStore, chunk]);
  });
  readableStream.on('end', () => {
    onSuccess({ model, buffer: bufferStore });
  });
  readableStream.on('error', (e) => {
    onError(e);
  });
}

This is my example, it is a nextjs framework. I hope that will be helpful.

juhana commented 11 months ago

Note that the Typescript types aren't correct when reading the response as a stream in Node. You have to do const stream = response.body as unknown as Readable; for it to not throw type errors.

rattrayalex commented 11 months ago

To fix those type errors, add import 'openai/shims/node' to the top of your file (details here) if you're on Node, or import 'openai/shims/web' if you're on anything else.

We're working to improve this.

PetersonFonseca commented 10 months ago

Hello everyone, can you please help me implement it on node? I can't make it work...

`import path from "path"; import OpenAI from "openai";

const openai = new OpenAI({ apiKey: process.env.OPENAI_SECRET_KEY, });

const response = openai.audio.speech.create({ model: "tts-1", voice: "onyx", input: "Teste de texto para fala.", });

response.stream_to_file(path.resolve("./speech.mp3"));`

rattrayalex commented 10 months ago

We don't provide a stream_to_file method; instead, use response.body.pipe(fs.createWriteStream(myPath)). Here's a complete example:

import OpenAI from 'openai';
import fs from 'fs';
import path from 'path';

// gets API Key from environment variable OPENAI_API_KEY
const openai = new OpenAI();

const speechFile = path.resolve(__dirname, './speech.mp3');

async function streamToFile(stream: NodeJS.ReadableStream, path: fs.PathLike) {
  return new Promise((resolve, reject) => {
    const writeStream = fs.createWriteStream(path).on('error', reject).on('finish', resolve);

    stream.pipe(writeStream).on('error', (error) => {
      writeStream.close();
      reject(error);
    });
  });
}

async function main() {
  const mp3 = await openai.audio.speech.create({
    model: 'tts-1',
    voice: 'alloy',
    input: 'the quick brown chicken jumped over the lazy dogs',
  });

  await streamToFile(mp3.body, speechFile);
}
main();

yaozijun commented 10 months ago

async function streamToFile(stream, path) {
    return new Promise((resolve, reject) => {
        const writeStream = fs.createWriteStream(path)
            .on('error', reject)
            .on('finish', resolve);

        stream.pipe(writeStream)
            .on('error', (error) => {
                writeStream.close();
                reject(error);
            });
    });
}
const ret= await openai.audio.speech.create({
                model: "tts-1",
                voice: "onyx",
                input: "test",
            });
const stream = ret.body;
const speechFile = path.resolve(`/xxx/test.mp3`);
await streamToFile(stream, speechFile);

karar-shah commented 9 months ago

text2Speech

@c121914yu Could you kindly assist me in playing the audio stream on the client side? Thank you.

karar-shah commented 9 months ago

We don't provide a stream_to_file method; instead, use response.body.pipe(fs.createWriteStream(myPath)). Here's a complete example:

import OpenAI from 'openai';
import fs from 'fs';
import path from 'path';

// gets API Key from environment variable OPENAI_API_KEY
const openai = new OpenAI();

const speechFile = path.resolve(__dirname, './speech.mp3');

async function streamToFile(stream: NodeJS.ReadableStream, path: fs.PathLike) {
  return new Promise((resolve, reject) => {
    const writeStream = fs.createWriteStream(path).on('error', reject).on('finish', resolve);

    stream.pipe(writeStream).on('error', (error) => {
      writeStream.close();
      reject(error);
    });
  });
}

async function main() {
  const mp3 = await openai.audio.speech.create({
    model: 'tts-1',
    voice: 'alloy',
    input: 'the quick brown chicken jumped over the lazy dogs',
  });

  await streamToFile(mp3.body, speechFile);
}
main();

@rattrayalex , I would like to inquire about the process of streaming the audio response on the Client component in Next.js. Despite searching for the past 1 day, I have been unable to find a solution. Thank you so much for your help.

c121914yu commented 9 months ago

text2Speech
@c121914yu Could you kindly assist me in playing the audio stream on the client side? Thank you.

https://github.com/labring/FastGPT/blob/main/projects/app/src/web/common/utils/voice.ts

I haven't brought a computer with me recently, so I can't copy the code easily.

You can refer to my code for client streaming through fetch and MediaSource Api.

However, I have found that this api has some compatibility issues in apple products.

athrael-soju commented 9 months ago

text2Speech
@c121914yu Could you kindly assist me in playing the audio stream on the client side? Thank you.
https://github.com/labring/FastGPT/blob/main/projects/app/src/web/common/utils/voice.ts

I haven't brought a computer with me recently, so I can't copy the code easily.

You can refer to my code for client streaming through fetch and MediaSource Api.

However, I have found that this api has some compatibility issues in apple products.

You're gonna need polyfill for that

aleksa-codes commented 4 months ago

export async function text2Speech({
  res,
  onSuccess,
  onError,
  model = defaultAudioSpeechModels[0].model,
  voice = Text2SpeechVoiceEnum.alloy,
  input,
  speed = 1
}: {
  res: NextApiResponse;
  onSuccess: (e: { model: string; buffer: Buffer }) => void;
  onError: (e: any) => void;
  model?: string;
  voice?: `${Text2SpeechVoiceEnum}`;
  input: string;
  speed?: number;
}) {
  const ai = getAIApi();
  const response = await ai.audio.speech.create({
    model,
    voice,
    input,
    response_format: 'mp3',
    speed
  });

  const readableStream = response.body as unknown as NodeJS.ReadableStream;
  readableStream.pipe(res);

  let bufferStore = Buffer.from([]);

  readableStream.on('data', (chunk) => {
    bufferStore = Buffer.concat([bufferStore, chunk]);
  });
  readableStream.on('end', () => {
    onSuccess({ model, buffer: bufferStore });
  });
  readableStream.on('error', (e) => {
    onError(e);
  });
}

This is my example, it is a nextjs framework. I hope that will be helpful.

Can someone please help me. How do I use this or something similar to have an API route handler (endpoint) and call it from the frontend component in Next.js? I am basically trying to rebuild TTS fucntionallity that is in ChatGPT.

LeakedDave commented 4 months ago

Hey Aleksa, stumbled upon this because I'm building it myself. If you still need help... I re-wrote the above example as a simple API Route (pages router /api/voice.js)

import OpenAI from "openai";
const openai = new OpenAI({
    apiKey: process.env.OPENAI_API_KEY,
})

export default async function handler(req, res) {
    const { input } = req.query
    res.setHeader('Content-Type', 'audio/mpeg')

    const response = await openai.audio.speech.create({
        model: "tts-1",
        voice: "alloy",
        input: input,
        response_format: 'mp3',
        speed: 1
    })

    const readableStream = response.body
    readableStream.pipe(res)

    let bufferStore = Buffer.from([])

    readableStream.on('data', (chunk) => {
        bufferStore = Buffer.concat([bufferStore, chunk])
    })

    readableStream.on('end', () => {
        // Store the mp3 somewhere if you want to reuse it
        // onSuccess({ model, buffer: bufferStore });
    })

    readableStream.on('error', (e) => {
        console.error(e)
    })
}

To play it locally from your client side, simple:

const input = "Today is a wonderful day to build something people love!"
new Audio(`/api/voice?input=${input}`).play()

kifjj commented 2 weeks ago

Hi LeakedDave and Alexa,

I implemented the simple route api example nodejs/express and hosted in several environments, google firebase, google app engine and while the streaming works, I observed a strange thing, basically the audio starts playing after 6s to 8s.

I tried many things on the servers (increase memory, move to a closer region) but no luck.

Any idea?

LeakedDave commented 2 weeks ago

Hi LeakedDave and Alexa,

I implemented the simple route api example nodejs/express and hosted in several environments, google firebase, google app engine and while the streaming works, I observed a strange thing, basically the audio starts playing after 6s to 8s.

I tried many things on the servers (increase memory, move to a closer region) but no luck.

Any idea?

Honestly I’m not sure. If possible I would suggest to just host a NextJS API for this, I haven’t tested it with vanilla express at all. It sounds like your API doesn’t support streaming since 6-7 second wait would be the full audio I think.

kifjj commented 2 weeks ago

@LeakedDave oh, you are right Firebase functions and google app engine don't support streaming. Thanks for putting me on the right path. Looking around I see AWS Lambda introduced streaming support 1 year ago but with some limitation (API Gateway, ALB not supported).

I tried and it works, stream starts in 3s or when cold start 5s. Much better user experience.

Here is the code if someone needs

I started from the aws lambda streaming example and tweek with my code just to make it work
be sure to select nodejs
be sure to increase the timeout, default is just 3s
for function url: 1) use AUTH TYPE: NONE if you want public access and 2) IMPORTANT select Invoke mode RESPONSE_STREAM otherwise won't stream :-)

`/ global fetch / import util from 'util'; import stream from 'stream'; const { Readable } = stream; const pipeline = util.promisify(stream.pipeline);

/ global awslambda / export const handler = awslambda.streamifyResponse(async (event, responseStream, _context) => {

console.log("Query params" + event["queryStringParameters"]["text"]);

//console.log("event json: " + JSON.stringify(event));

const textToTTS = event["queryStringParameters"]["text"];

if (!textToTTS) { console.log("no text to translate sent [" + textToTTS + "]"); return; }

const rs = await fetch('https://api.openai.com/v1/audio/speech', { method: 'POST', headers: { Authorization: 'Bearer ' + okey, 'Content-Type': 'application/json', }, body: JSON.stringify({ input: textToTTS, model: 'tts-1', response_format: 'mp3', voice: 'echo', }), });

await pipeline(rs.body, responseStream); }); `

openai / openai-node

How to stream Text to Speech? #487