Open Shulyaka opened 8 months ago
Just to clarify, your request pertains to uploading audio files with in openai.audio.transcriptions.create()
endpoint, correct?
We do want to support streaming request bodies in this way, but unfortunately I'm not sure that we'll be able to get to it soon.
Yes, correct. Thank you!
It's not exactly what you're describing, but it is kind of related and I figured those on this thread may find it useful. It links together a whole chain such that you can stream the audio response to a prompt. It works using threading by using one thread to stream the text reply into phrases which are enqueued for TTS. Then a second thread which TTS's each phrase as it completes. And finally a third thread which starts playing out loud each phrase as it's been TTS'd. The final effect is much like working with the ChatGPT app where you get "streaming audio response" to your question and don't have to wait to have the full text come back before you can start listening to audio. What's here I'm sure could be improved and it's primarily designed to show, in a terminal, it all put together.
https://gist.github.com/Ga68/3862688ab55b9d9b41256572b1fedc67
Confirm this is a feature request for the Python library and not the underlying OpenAI API.
Describe the feature or improvement you're requesting
It would be nice to start data transfer as soon as it becomes available for the real-time voice recognition. We already have a similar feature for tts: https://platform.openai.com/docs/guides/text-to-speech/streaming-real-time-audio Please note, I am not saying that a transcript should be available before the speech ended. But I would like to start the data transfer earlier.
Additional context
The HTTP supports sending files in chunks without knowing the length in advance. A WAV header does require the length, however 0xFFFFFFFF (i.e. max length) works fine with Whisper (I checked).