I've searched for similar issues, and there are several gradio Audio component issues. So I'm not sure if they report the same problems.
What I'm trying to do is to stream the TTS OpenAI API response. The OpenAI part is working. However, I do not get the Audio component behaviour.
What I've tried:
To return only a single bytes object chunk. This leads to stuttering voice as the audio plays, then stops, then receives the next chunk.
To return a concatenation of all bytes object chunks (chunks += chunk). This leads to audio, that plays for a second until the next chunk is concatenated to the already existing chunks. Then the audio autoplay starts from beginning. Hence, the audio is also stuttering and never plays through.
Further, only
out = gr.Audio(autoplay=True) seems to work.
out = gr.Audio(autoplay=True, streaming=True) does not work and it just does nothing for whatever reason.
Actually, the optimal solution in my opinion would be, if "streaming=True" is set and one appends the incoming chunks to the already existing chunks, that the audio component does not always restart to play.
Have you searched existing issues? 🔎
[X] I have searched and found no existing issues
Reproduction
def text_to_speech_streaming():
with client.audio.speech.with_streaming_response.create(
model="tts-1-hd",
voice="alloy",
input="This is a special test text that I want to get generated to test streaming the generated voice directly from OpenAI into my gradio application."
) as response:
for chunk in response.iter_bytes(chunk_size=8192):
yield chunk
def add_to_stream(audio, instream):
global tts_generator
if audio is None:
return gr.update(), instream
if tts_generator is None:
tts_generator = text_to_speech_streaming()
try:
chunk = next(tts_generator)
except StopIteration:
tts_generator = None
return chunk, chunk
with gr.Blocks() as demo:
inp = gr.Audio(sources="microphone")
out = gr.Audio(streaming=True)
stream = gr.State()
clear = gr.Button("Clear")
inp.stream(add_to_stream, [inp, stream], [out, stream])
clear.click(lambda: [None, None, None], None, [inp, out, stream])
if __name__ == "__main__":
demo.launch()
Screenshot
No response
Logs
No response
System Info
Gradio Environment Information:
------------------------------
Operating System: Darwin
gradio version: 4.18.0
gradio_client version: 0.10.0
------------------------------------------------
gradio dependencies in your environment:
aiofiles: 23.2.1
altair: 5.2.0
fastapi: 0.109.2
ffmpy: 0.3.2
gradio-client==0.10.0 is not installed.
httpx: 0.26.0
huggingface-hub: 0.20.3
importlib-resources: 6.1.1
jinja2: 3.1.3
markupsafe: 2.1.5
matplotlib: 3.8.2
numpy: 1.26.4
orjson: 3.9.13
packaging: 23.2
pandas: 2.2.1
pillow: 10.2.0
pydantic: 2.6.1
pydub: 0.25.1
python-multipart: 0.0.9
pyyaml: 6.0.1
ruff: 0.2.1
semantic-version: 2.10.0
tomlkit==0.12.0 is not installed.
typer: 0.9.0
typing-extensions: 4.9.0
uvicorn: 0.27.1
authlib; extra == 'oauth' is not installed.
itsdangerous; extra == 'oauth' is not installed.
gradio_client dependencies in your environment:
fsspec: 2024.2.0
httpx: 0.26.0
huggingface-hub: 0.20.3
packaging: 23.2
typing-extensions: 4.9.0
websockets: 11.0.3
Describe the bug
Hey folks,
I've searched for similar issues, and there are several gradio Audio component issues. So I'm not sure if they report the same problems.
What I'm trying to do is to stream the TTS OpenAI API response. The OpenAI part is working. However, I do not get the Audio component behaviour.
What I've tried:
Further, only
out = gr.Audio(autoplay=True)
seems to work.out = gr.Audio(autoplay=True, streaming=True)
does not work and it just does nothing for whatever reason.Actually, the optimal solution in my opinion would be, if "streaming=True" is set and one appends the incoming chunks to the already existing chunks, that the audio component does not always restart to play.
Have you searched existing issues? 🔎
Reproduction
Screenshot
No response
Logs
No response
System Info
Severity
I can work around it