gradio-app / gradio

Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
http://www.gradio.app
Apache License 2.0
29.49k stars 2.19k forks source link

Audio Streaming: large latency before first chunk is played #8185

Open sanchit-gandhi opened 2 weeks ago

sanchit-gandhi commented 2 weeks ago

We typically stream audio outputs when latency is a major consideration. E.g. if we're generating 10-seconds of audio and want the perceived latency to be as low as possible, we can stream the outputs in 1-second chunks, such that the user can start playing the audio 10x faster than if they waited for the full 10-second audio. Here's an example for Parler-TTS.

When using the Gradio streaming component, we typically have to wait 3-4 seconds after the first chunk is returned before the output starts playing. This fixed overhead negates the latency improvement we expect from streaming. The result is that it's very difficult to showcase streaming outputs using Gradio.

This Space demonstrates the issue in a MWE: https://huggingface.co/spaces/sanchit-gandhi/audio-streaming We have a 30-second audio, which we stream in 2-second chunks. It takes 1-second for the first chunk to be returned, but the audio only starts playing after an additional 3-4 seconds.

If we could reduce this to near zero additional overhead, it would make showcasing streaming outputs in Gradio much more feasible.

cc @aliabd @abidlabs @hannahblair @ylacombe

sanchit-gandhi commented 2 weeks ago

Related to #8177, but the MWE demonstrates that the full audio does not need to be streamed, but rather there's a fixed lag after the first chunk is received

sanchit-gandhi commented 3 days ago

Any luck with this @aliabd?