Closed Jackiexiao closed 1 year ago
It's been a few months. Are there any plans to implement this?
Sorry for the lack of follow up on this issue, but this is already possible! There's a basic example in gradio/demo/stream_audio/run.py
which streams in an input audio and outputs the same audio in a streaming manner. It should possible to adapt this logic for a voice conversion / TTS demo.
Here's the code:
import gradio as gr
import numpy as np
import time
def add_to_stream(audio, instream):
time.sleep(1)
if audio is None:
return gr.update(), instream
if instream is None:
ret = audio
else:
ret = (audio[0], np.concatenate((instream[1], audio[1])))
return ret, ret
with gr.Blocks() as demo:
inp = gr.Audio(source="microphone")
out = gr.Audio()
stream = gr.State()
clear = gr.Button("Clear")
inp.stream(add_to_stream, [inp, stream], [out, stream])
clear.click(lambda: [None, None, None], None, [inp, out, stream])
if __name__ == "__main__":
demo.launch()
I'll go ahead and close the issue as it seems to me that this is solved, but if I'm wrong, feel free to reopen it with more details.
Would you mind adding documentation for this? This was not obvious even after a thorough (~30 min) reading of the docs.
Is there a way to stream audio from a remote source such as when using Amazon Polly or Eleven Labs?
@arjunbansal that's an entirely different question. I'd suggest you open a brand-new issue for more visibility.
Sorry for the lack of follow up on this issue, but this is already possible! There's a basic example in
gradio/demo/stream_audio/run.py
which streams in an input audio and outputs the same audio in a streaming manner. It should possible to adapt this logic for a voice conversion / TTS demo.Here's the code:
import gradio as gr import numpy as np import time def add_to_stream(audio, instream): time.sleep(1) if audio is None: return gr.update(), instream if instream is None: ret = audio else: ret = (audio[0], np.concatenate((instream[1], audio[1]))) return ret, ret with gr.Blocks() as demo: inp = gr.Audio(source="microphone") out = gr.Audio() stream = gr.State() clear = gr.Button("Clear") inp.stream(add_to_stream, [inp, stream], [out, stream]) clear.click(lambda: [None, None, None], None, [inp, out, stream]) if __name__ == "__main__": demo.launch()
I'll go ahead and close the issue as it seems to me that this is solved, but if I'm wrong, feel free to reopen it with more details.
I am working on real-time voice conversion lately and intended to use Gradio to present a demo for my paper. However, after trying the provided code, I discovered that it doesn't support real-time audio playback, leaving me uncertain about how to play the converted audio in a streaming manner.
Sorry for the lack of follow up on this issue, but this is already possible! There's a basic example in
gradio/demo/stream_audio/run.py
which streams in an input audio and outputs the same audio in a streaming manner. It should possible to adapt this logic for a voice conversion / TTS demo. Here's the code:import gradio as gr import numpy as np import time def add_to_stream(audio, instream): time.sleep(1) if audio is None: return gr.update(), instream if instream is None: ret = audio else: ret = (audio[0], np.concatenate((instream[1], audio[1]))) return ret, ret with gr.Blocks() as demo: inp = gr.Audio(source="microphone") out = gr.Audio() stream = gr.State() clear = gr.Button("Clear") inp.stream(add_to_stream, [inp, stream], [out, stream]) clear.click(lambda: [None, None, None], None, [inp, out, stream]) if __name__ == "__main__": demo.launch()
I'll go ahead and close the issue as it seems to me that this is solved, but if I'm wrong, feel free to reopen it with more details.
I am working on real-time voice conversion lately and intended to use Gradio to present a demo for my paper. However, after trying the provided code, I discovered that it doesn't support real-time audio playback, leaving me uncertain about how to play the converted audio in a streaming manner.
Do you have a solution? I am also testing real time voice to voice. I think we maybe able to use yield to stream the audio?
Have you taken a look at the last example on this page? https://www.gradio.app/guides/reactive-interfaces
Should allow you to do real time voice generation
Sorry for the lack of follow up on this issue, but this is already possible! There's a basic example in
gradio/demo/stream_audio/run.py
which streams in an input audio and outputs the same audio in a streaming manner. It should possible to adapt this logic for a voice conversion / TTS demo. Here's the code:import gradio as gr import numpy as np import time def add_to_stream(audio, instream): time.sleep(1) if audio is None: return gr.update(), instream if instream is None: ret = audio else: ret = (audio[0], np.concatenate((instream[1], audio[1]))) return ret, ret with gr.Blocks() as demo: inp = gr.Audio(source="microphone") out = gr.Audio() stream = gr.State() clear = gr.Button("Clear") inp.stream(add_to_stream, [inp, stream], [out, stream]) clear.click(lambda: [None, None, None], None, [inp, out, stream]) if __name__ == "__main__": demo.launch()
I'll go ahead and close the issue as it seems to me that this is solved, but if I'm wrong, feel free to reopen it with more details.
I am working on real-time voice conversion lately and intended to use Gradio to present a demo for my paper. However, after trying the provided code, I discovered that it doesn't support real-time audio playback, leaving me uncertain about how to play the converted audio in a streaming manner.
Do you have a solution? I am also testing real time voice to voice. I think we maybe able to use yield to stream the audio?
did yield work? I am trying to implement TTS and while using yield getting this error : "Need to enable queue to use generators." i am lost on how to work on this.
Hi @prachii1910 all that means is instead of doing demo.launch()
, you should do demo.queue().launch()
, where demo
here is your Gradio Interface or Blocks
fyi best possible solution so far we tried (with help from HF) is this https://huggingface.co/spaces/coqui/xtts-streaming real problem is audio length must be known (instead of open a stream and play while input bytestream is appended) , unfortunately with a bytestream on refresh every yield audio feels clippy (under load)
Edit: This space now uses byte streaming https://huggingface.co/spaces/coqui/voice-chat-with-mistral , unfortunately audio play/end/change events are not emitted during yield (only on output), so it has to wait audio finishes and timing is not exact (I am trying to find a way for it)
I want to use gradio for realtime text-to-speech(tts) or realtime voice conversion(vc)
It's now possible to do realtime speech-to-text thx to https://github.com/gradio-app/gradio/pull/800, I wander if we could do the same thing for tts/vc
for example, [paddlespeech]() support stream tts, it accept text and yield wav chunk by chunk,
Bidirectional stream audio is use in realtime voice conversion, as far as I know, there is no open source realtime voice conversion project in github, but it is possible.
for simplest case, we can record audio by microphone and increase volume / pitch / add audio effect and play audio with bidirectional streaming