[Audio] Microphone Capture - Allow setting smaller chunk size for low latency

gradio-app / gradio

Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!

http://www.gradio.app

Apache License 2.0

31.27k stars 2.33k forks source link

[Audio] Microphone Capture - Allow setting smaller chunk size for low latency #6526

Open virajkarandikar opened 8 months ago

virajkarandikar commented 8 months ago

[x] I have searched to see if a similar issue already exists.

By default the streaming mic capture uses buffer/chunk size of 1 second. This adds a long latency in real time applications. Can the chunk size be made configurable/smaller?

Is your feature request related to a problem? Please describe.
Large buffer increases audio latency and makes application sluggish to use.

Describe the solution you'd like
Provide a parameter to configure chunk size when using streaming mic capture

Additional context
Add any other context or screenshots about the feature request here.

abidlabs commented 8 months ago

Hi @virajkarandikar can you provide sample code we can use to look at the issue?

virajkarandikar commented 8 months ago

Code is simple.

with gr.Blocks() as demo:
        audio = gr.Audio(streaming=True)

        def process_audio(audio):
            rate, data = audio
            print(f"rate: {rate}, samples: {len(data)}")

        audio.stream(process_audio, [audio], None)

Below is the log I get on console.

rate: 48000, samples: 24000
rate: 48000, samples: 48000
rate: 48000, samples: 24000
rate: 48000, samples: 24000
rate: 48000, samples: 24000
rate: 48000, samples: 24000

Log indicates - sample rate is 48000, channels is 1, chunk size varies between 24000 (0.5 sec) and 48000 (1 sec). This adds significant latency.

Also the uncompressed audio data at 48000Hz is streamed from the client to application and it adds some amount of network latency. My case model expects 16000 sample rate. So if I can specify sample rate for mic capture, it will reduce the amount of data transfer by 1/3rd. But for that I have filed another issue here https://github.com/gradio-app/gradio/issues/5848.

qianhuiliu commented 8 months ago

Hello, have you figured out how to do it? I have the same question.

virajkarandikar commented 8 months ago

Any update here?

gaborvecsei commented 6 months ago

I am also interested in this

virajkarandikar commented 6 months ago

Ping...

abidlabs commented 6 months ago

This is on our radar, but maybe will take a few weeks for us to get to as we have a lot of other issues we're tackling as well. We are happy to review any PRs if you'd like to contribute this fix.

cc @aliabid94

mcorroyer commented 2 months ago

I have the same issue, has it progressed?

adirajagopal commented 1 month ago

This is on our radar, but maybe will take a few weeks for us to get to as we have a lot of other issues we're tackling as well. We are happy to review any PRs if you'd like to contribute this fix.

cc @aliabid94

Hi, is there a specific part of the code base you could point to to suggest how we can reduce the chunk size of the stream? This would help with guiding the PR. Thanks!