gradio-app / gradio

Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
http://www.gradio.app
Apache License 2.0
30.81k stars 2.29k forks source link

Audio Component Streaming Behaviour is weird? #7742

Open s-kruschel opened 3 months ago

s-kruschel commented 3 months ago

Describe the bug

Hey folks,

I've searched for similar issues, and there are several gradio Audio component issues. So I'm not sure if they report the same problems.

What I'm trying to do is to stream the TTS OpenAI API response. The OpenAI part is working. However, I do not get the Audio component behaviour.

What I've tried:

  1. To return only a single bytes object chunk. This leads to stuttering voice as the audio plays, then stops, then receives the next chunk.
  2. To return a concatenation of all bytes object chunks (chunks += chunk). This leads to audio, that plays for a second until the next chunk is concatenated to the already existing chunks. Then the audio autoplay starts from beginning. Hence, the audio is also stuttering and never plays through.

Further, only out = gr.Audio(autoplay=True) seems to work. out = gr.Audio(autoplay=True, streaming=True) does not work and it just does nothing for whatever reason.

Actually, the optimal solution in my opinion would be, if "streaming=True" is set and one appends the incoming chunks to the already existing chunks, that the audio component does not always restart to play.

Have you searched existing issues? 🔎

Reproduction

def text_to_speech_streaming():
    with client.audio.speech.with_streaming_response.create(
            model="tts-1-hd",
            voice="alloy",
            input="This is a special test text that I want to get generated to test streaming the generated voice directly from OpenAI into my gradio application."
        ) as response:
            for chunk in response.iter_bytes(chunk_size=8192):
                yield chunk  

def add_to_stream(audio, instream):
    global tts_generator

    if audio is None:
        return gr.update(), instream

    if tts_generator is None:
        tts_generator = text_to_speech_streaming()

    try: 
        chunk = next(tts_generator)
    except StopIteration:
        tts_generator = None

    return chunk, chunk

with gr.Blocks() as demo:
    inp = gr.Audio(sources="microphone")
    out = gr.Audio(streaming=True)
    stream = gr.State()

    clear = gr.Button("Clear")

    inp.stream(add_to_stream, [inp, stream], [out, stream])
    clear.click(lambda: [None, None, None], None, [inp, out, stream])

if __name__ == "__main__":
    demo.launch()

Screenshot

No response

Logs

No response

System Info

Gradio Environment Information:
------------------------------
Operating System: Darwin
gradio version: 4.18.0
gradio_client version: 0.10.0

------------------------------------------------
gradio dependencies in your environment:

aiofiles: 23.2.1
altair: 5.2.0
fastapi: 0.109.2
ffmpy: 0.3.2
gradio-client==0.10.0 is not installed.
httpx: 0.26.0
huggingface-hub: 0.20.3
importlib-resources: 6.1.1
jinja2: 3.1.3
markupsafe: 2.1.5
matplotlib: 3.8.2
numpy: 1.26.4
orjson: 3.9.13
packaging: 23.2
pandas: 2.2.1
pillow: 10.2.0
pydantic: 2.6.1
pydub: 0.25.1
python-multipart: 0.0.9
pyyaml: 6.0.1
ruff: 0.2.1
semantic-version: 2.10.0
tomlkit==0.12.0 is not installed.
typer: 0.9.0
typing-extensions: 4.9.0
uvicorn: 0.27.1
authlib; extra == 'oauth' is not installed.
itsdangerous; extra == 'oauth' is not installed.

gradio_client dependencies in your environment:

fsspec: 2024.2.0
httpx: 0.26.0
huggingface-hub: 0.20.3
packaging: 23.2
typing-extensions: 4.9.0
websockets: 11.0.3

Severity

I can work around it

ajayarora1235 commented 4 weeks ago

did you end up finding a solution to this?

s-kruschel commented 4 weeks ago

Unfortunately not…