Closed user080975 closed 10 months ago
This needs to be added. I've tried all different methods but I don't think it's supported natively in the node SDK at all at the moment.
This does return a streamable object but there are no chunks found while iterating through it:
const stream = await openai.audio.speech.create( { model: 'tts-1', voice: 'alloy', input: textData, response_format: 'opus', }, { stream: true }, );
Yes, this works today – I'm sorry that the example code doesn't reflect that.
You can simply access response.body
which is a readable stream (in web, a true ReadableStream
and in Node, a Readable
), like so:
async function main() {
const response = await openai.audio.speech.create({
model: 'tts-1',
voice: 'alloy',
input: 'the quick brown fox jumped over the lazy dogs',
});
const stream = response.body;
}
I'll try to update the example soon, and won't close this issue until I do. Feel free to share use-cases you'd like to see in the example here, with sample code.
export async function text2Speech({
res,
onSuccess,
onError,
model = defaultAudioSpeechModels[0].model,
voice = Text2SpeechVoiceEnum.alloy,
input,
speed = 1
}: {
res: NextApiResponse;
onSuccess: (e: { model: string; buffer: Buffer }) => void;
onError: (e: any) => void;
model?: string;
voice?: `${Text2SpeechVoiceEnum}`;
input: string;
speed?: number;
}) {
const ai = getAIApi();
const response = await ai.audio.speech.create({
model,
voice,
input,
response_format: 'mp3',
speed
});
const readableStream = response.body as unknown as NodeJS.ReadableStream;
readableStream.pipe(res);
let bufferStore = Buffer.from([]);
readableStream.on('data', (chunk) => {
bufferStore = Buffer.concat([bufferStore, chunk]);
});
readableStream.on('end', () => {
onSuccess({ model, buffer: bufferStore });
});
readableStream.on('error', (e) => {
onError(e);
});
}
This is my example, it is a nextjs framework. I hope that will be helpful.
Note that the Typescript types aren't correct when reading the response as a stream in Node. You have to do const stream = response.body as unknown as Readable;
for it to not throw type errors.
To fix those type errors, add import 'openai/shims/node'
to the top of your file (details here) if you're on Node, or import 'openai/shims/web'
if you're on anything else.
We're working to improve this.
Hello everyone, can you please help me implement it on node? I can't make it work...
`import path from "path"; import OpenAI from "openai";
const openai = new OpenAI({ apiKey: process.env.OPENAI_SECRET_KEY, });
const response = openai.audio.speech.create({ model: "tts-1", voice: "onyx", input: "Teste de texto para fala.", });
response.stream_to_file(path.resolve("./speech.mp3"));`
We don't provide a stream_to_file
method; instead, use response.body.pipe(fs.createWriteStream(myPath))
. Here's a complete example:
import OpenAI from 'openai';
import fs from 'fs';
import path from 'path';
// gets API Key from environment variable OPENAI_API_KEY
const openai = new OpenAI();
const speechFile = path.resolve(__dirname, './speech.mp3');
async function streamToFile(stream: NodeJS.ReadableStream, path: fs.PathLike) {
return new Promise((resolve, reject) => {
const writeStream = fs.createWriteStream(path).on('error', reject).on('finish', resolve);
stream.pipe(writeStream).on('error', (error) => {
writeStream.close();
reject(error);
});
});
}
async function main() {
const mp3 = await openai.audio.speech.create({
model: 'tts-1',
voice: 'alloy',
input: 'the quick brown chicken jumped over the lazy dogs',
});
await streamToFile(mp3.body, speechFile);
}
main();
async function streamToFile(stream, path) {
return new Promise((resolve, reject) => {
const writeStream = fs.createWriteStream(path)
.on('error', reject)
.on('finish', resolve);
stream.pipe(writeStream)
.on('error', (error) => {
writeStream.close();
reject(error);
});
});
}
const ret= await openai.audio.speech.create({
model: "tts-1",
voice: "onyx",
input: "test",
});
const stream = ret.body;
const speechFile = path.resolve(`/xxx/test.mp3`);
await streamToFile(stream, speechFile);
text2Speech
@c121914yu Could you kindly assist me in playing the audio stream on the client side? Thank you.
We don't provide a
stream_to_file
method; instead, useresponse.body.pipe(fs.createWriteStream(myPath))
. Here's a complete example:import OpenAI from 'openai'; import fs from 'fs'; import path from 'path'; // gets API Key from environment variable OPENAI_API_KEY const openai = new OpenAI(); const speechFile = path.resolve(__dirname, './speech.mp3'); async function streamToFile(stream: NodeJS.ReadableStream, path: fs.PathLike) { return new Promise((resolve, reject) => { const writeStream = fs.createWriteStream(path).on('error', reject).on('finish', resolve); stream.pipe(writeStream).on('error', (error) => { writeStream.close(); reject(error); }); }); } async function main() { const mp3 = await openai.audio.speech.create({ model: 'tts-1', voice: 'alloy', input: 'the quick brown chicken jumped over the lazy dogs', }); await streamToFile(mp3.body, speechFile); } main();
@rattrayalex , I would like to inquire about the process of streaming the audio response on the Client component in Next.js. Despite searching for the past 1 day, I have been unable to find a solution. Thank you so much for your help.
text2Speech
@c121914yu Could you kindly assist me in playing the audio stream on the client side? Thank you.
https://github.com/labring/FastGPT/blob/main/projects/app/src/web/common/utils/voice.ts
I haven't brought a computer with me recently, so I can't copy the code easily.
You can refer to my code for client streaming through fetch and MediaSource Api.
However, I have found that this api has some compatibility issues in apple products.
text2Speech
@c121914yu Could you kindly assist me in playing the audio stream on the client side? Thank you.
https://github.com/labring/FastGPT/blob/main/projects/app/src/web/common/utils/voice.ts
I haven't brought a computer with me recently, so I can't copy the code easily.
You can refer to my code for client streaming through fetch and MediaSource Api.
However, I have found that this api has some compatibility issues in apple products.
You're gonna need polyfill for that
export async function text2Speech({ res, onSuccess, onError, model = defaultAudioSpeechModels[0].model, voice = Text2SpeechVoiceEnum.alloy, input, speed = 1 }: { res: NextApiResponse; onSuccess: (e: { model: string; buffer: Buffer }) => void; onError: (e: any) => void; model?: string; voice?: `${Text2SpeechVoiceEnum}`; input: string; speed?: number; }) { const ai = getAIApi(); const response = await ai.audio.speech.create({ model, voice, input, response_format: 'mp3', speed }); const readableStream = response.body as unknown as NodeJS.ReadableStream; readableStream.pipe(res); let bufferStore = Buffer.from([]); readableStream.on('data', (chunk) => { bufferStore = Buffer.concat([bufferStore, chunk]); }); readableStream.on('end', () => { onSuccess({ model, buffer: bufferStore }); }); readableStream.on('error', (e) => { onError(e); }); }
This is my example, it is a nextjs framework. I hope that will be helpful.
Can someone please help me. How do I use this or something similar to have an API route handler (endpoint) and call it from the frontend component in Next.js? I am basically trying to rebuild TTS fucntionallity that is in ChatGPT.
Hey Aleksa, stumbled upon this because I'm building it myself. If you still need help... I re-wrote the above example as a simple API Route (pages router /api/voice.js)
import OpenAI from "openai";
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
})
export default async function handler(req, res) {
const { input } = req.query
res.setHeader('Content-Type', 'audio/mpeg')
const response = await openai.audio.speech.create({
model: "tts-1",
voice: "alloy",
input: input,
response_format: 'mp3',
speed: 1
})
const readableStream = response.body
readableStream.pipe(res)
let bufferStore = Buffer.from([])
readableStream.on('data', (chunk) => {
bufferStore = Buffer.concat([bufferStore, chunk])
})
readableStream.on('end', () => {
// Store the mp3 somewhere if you want to reuse it
// onSuccess({ model, buffer: bufferStore });
})
readableStream.on('error', (e) => {
console.error(e)
})
}
To play it locally from your client side, simple:
const input = "Today is a wonderful day to build something people love!"
new Audio(`/api/voice?input=${input}`).play()
Hi LeakedDave and Alexa,
I implemented the simple route api example nodejs/express and hosted in several environments, google firebase, google app engine and while the streaming works, I observed a strange thing, basically the audio starts playing after 6s to 8s.
I tried many things on the servers (increase memory, move to a closer region) but no luck.
Any idea?
Hi LeakedDave and Alexa,
I implemented the simple route api example nodejs/express and hosted in several environments, google firebase, google app engine and while the streaming works, I observed a strange thing, basically the audio starts playing after 6s to 8s.
I tried many things on the servers (increase memory, move to a closer region) but no luck.
Any idea?
Honestly I’m not sure. If possible I would suggest to just host a NextJS API for this, I haven’t tested it with vanilla express at all. It sounds like your API doesn’t support streaming since 6-7 second wait would be the full audio I think.
@LeakedDave oh, you are right Firebase functions and google app engine don't support streaming. Thanks for putting me on the right path. Looking around I see AWS Lambda introduced streaming support 1 year ago but with some limitation (API Gateway, ALB not supported).
I tried and it works, stream starts in 3s or when cold start 5s. Much better user experience.
Here is the code if someone needs
`/ global fetch / import util from 'util'; import stream from 'stream'; const { Readable } = stream; const pipeline = util.promisify(stream.pipeline);
/ global awslambda / export const handler = awslambda.streamifyResponse(async (event, responseStream, _context) => {
console.log("Query params" + event["queryStringParameters"]["text"]);
//console.log("event json: " + JSON.stringify(event));
const textToTTS = event["queryStringParameters"]["text"];
if (!textToTTS) { console.log("no text to translate sent [" + textToTTS + "]"); return; }
const rs = await fetch('https://api.openai.com/v1/audio/speech', { method: 'POST', headers: { Authorization: 'Bearer ' + okey, 'Content-Type': 'application/json', }, body: JSON.stringify({ input: textToTTS, model: 'tts-1', response_format: 'mp3', voice: 'echo', }), });
await pipeline(rs.body, responseStream); }); `
According to the documentation here for Text to Speech: https://platform.openai.com/docs/guides/text-to-speech?lang=node
There is the possibility of streaming audio without waiting for the full file to buffer. But the example is a Python one. Is there any possibility of streaming the incoming audio using Node JS?