Streaming text to speech?

nate-simpson commented 3 weeks ago

I see in the official docs at https://platform.openai.com/docs/guides/text-to-speech the following:

The Speech API provides support for real time audio streaming using chunk transfer encoding. This means that the audio is able to be played before the full file has been generated and made accessible.

There's python code for it, apparently, but it seems to be unsupported in this library so far. If it's supported, can you point me there? Otherwise, it would be really nice to have, as the latency with any vaguely long chunk of text for TTS is intolerably long for offering the ability to read back chat replies.

KrzysztofCwalina commented 3 weeks ago

I have not tried it, but there is an overload of the AudioClient.GenerateSppechFromText that might work for you:

public virtual ClientResult GenerateSpeechFromText(BinaryContent content, RequestOptions options = null);

to call this in streaming mode:

RequestOptions options = new() { BufferResponse = false };

var json = BinaryData.FromObjectAsJson(new {
            model = "tts-1",
            input = "Today is a wonderful day to build something people love!",
            voice = "alloy"
});

AudioClient client = ...;
var result = client.GenerateSpeechFromText(BinaryContent.Create(json), options);
PipelineResponse response = result.GetRawResponse();
using Stream stream = response.ContentStream; // very important to dispose the stream

nate-simpson commented 2 weeks ago

This seems to work nicely. Thanks!

openai / openai-dotnet

Streaming text to speech? #45