Open virajkarandikar opened 8 months ago
Hi @virajkarandikar can you provide sample code we can use to look at the issue?
Code is simple.
with gr.Blocks() as demo:
audio = gr.Audio(streaming=True)
def process_audio(audio):
rate, data = audio
print(f"rate: {rate}, samples: {len(data)}")
audio.stream(process_audio, [audio], None)
Below is the log I get on console.
rate: 48000, samples: 24000
rate: 48000, samples: 48000
rate: 48000, samples: 24000
rate: 48000, samples: 24000
rate: 48000, samples: 24000
rate: 48000, samples: 24000
Log indicates - sample rate is 48000, channels is 1, chunk size varies between 24000 (0.5 sec) and 48000 (1 sec). This adds significant latency.
Also the uncompressed audio data at 48000Hz is streamed from the client to application and it adds some amount of network latency. My case model expects 16000 sample rate. So if I can specify sample rate for mic capture, it will reduce the amount of data transfer by 1/3rd. But for that I have filed another issue here https://github.com/gradio-app/gradio/issues/5848.
Hello, have you figured out how to do it? I have the same question.
Any update here?
I am also interested in this
Ping...
This is on our radar, but maybe will take a few weeks for us to get to as we have a lot of other issues we're tackling as well. We are happy to review any PRs if you'd like to contribute this fix.
cc @aliabid94
I have the same issue, has it progressed?
This is on our radar, but maybe will take a few weeks for us to get to as we have a lot of other issues we're tackling as well. We are happy to review any PRs if you'd like to contribute this fix.
cc @aliabid94
Hi, is there a specific part of the code base you could point to to suggest how we can reduce the chunk size of the stream? This would help with guiding the PR. Thanks!
By default the streaming mic capture uses buffer/chunk size of 1 second. This adds a long latency in real time applications. Can the chunk size be made configurable/smaller?
Is your feature request related to a problem? Please describe.
Large buffer increases audio latency and makes application sluggish to use.
Describe the solution you'd like
Provide a parameter to configure chunk size when using streaming mic capture
Additional context
Add any other context or screenshots about the feature request here.