gradio-app / gradio

Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
http://www.gradio.app
Apache License 2.0
33.92k stars 2.57k forks source link

Request for streaming audio output component #1775

Closed Jackiexiao closed 1 year ago

Jackiexiao commented 2 years ago

I want to use gradio for realtime text-to-speech(tts) or realtime voice conversion(vc)

It's now possible to do realtime speech-to-text thx to https://github.com/gradio-app/gradio/pull/800, I wander if we could do the same thing for tts/vc

for example, [paddlespeech]() support stream tts, it accept text and yield wav chunk by chunk,

Bidirectional stream audio is use in realtime voice conversion, as far as I know, there is no open source realtime voice conversion project in github, but it is possible.

for simplest case, we can record audio by microphone and increase volume / pitch / add audio effect and play audio with bidirectional streaming

Pandapip1 commented 1 year ago

It's been a few months. Are there any plans to implement this?

abidlabs commented 1 year ago

Sorry for the lack of follow up on this issue, but this is already possible! There's a basic example in gradio/demo/stream_audio/run.py which streams in an input audio and outputs the same audio in a streaming manner. It should possible to adapt this logic for a voice conversion / TTS demo.

Here's the code:

import gradio as gr
import numpy as np
import time

def add_to_stream(audio, instream):
    time.sleep(1)
    if audio is None:
        return gr.update(), instream
    if instream is None:
        ret = audio
    else:
        ret = (audio[0], np.concatenate((instream[1], audio[1])))
    return ret, ret

with gr.Blocks() as demo:
    inp = gr.Audio(source="microphone")
    out = gr.Audio()
    stream = gr.State()
    clear = gr.Button("Clear")

    inp.stream(add_to_stream, [inp, stream], [out, stream])
    clear.click(lambda: [None, None, None], None, [inp, out, stream])

if __name__ == "__main__":
    demo.launch()

I'll go ahead and close the issue as it seems to me that this is solved, but if I'm wrong, feel free to reopen it with more details.

Pandapip1 commented 1 year ago

Would you mind adding documentation for this? This was not obvious even after a thorough (~30 min) reading of the docs.

arjunbansal commented 1 year ago

Is there a way to stream audio from a remote source such as when using Amazon Polly or Eleven Labs?

Pandapip1 commented 1 year ago

@arjunbansal that's an entirely different question. I'd suggest you open a brand-new issue for more visibility.

NZqian commented 1 year ago

Sorry for the lack of follow up on this issue, but this is already possible! There's a basic example in gradio/demo/stream_audio/run.py which streams in an input audio and outputs the same audio in a streaming manner. It should possible to adapt this logic for a voice conversion / TTS demo.

Here's the code:

import gradio as gr
import numpy as np
import time

def add_to_stream(audio, instream):
    time.sleep(1)
    if audio is None:
        return gr.update(), instream
    if instream is None:
        ret = audio
    else:
        ret = (audio[0], np.concatenate((instream[1], audio[1])))
    return ret, ret

with gr.Blocks() as demo:
    inp = gr.Audio(source="microphone")
    out = gr.Audio()
    stream = gr.State()
    clear = gr.Button("Clear")

    inp.stream(add_to_stream, [inp, stream], [out, stream])
    clear.click(lambda: [None, None, None], None, [inp, out, stream])

if __name__ == "__main__":
    demo.launch()

I'll go ahead and close the issue as it seems to me that this is solved, but if I'm wrong, feel free to reopen it with more details.

I am working on real-time voice conversion lately and intended to use Gradio to present a demo for my paper. However, after trying the provided code, I discovered that it doesn't support real-time audio playback, leaving me uncertain about how to play the converted audio in a streaming manner.

BenjiKCF commented 1 year ago

Sorry for the lack of follow up on this issue, but this is already possible! There's a basic example in gradio/demo/stream_audio/run.py which streams in an input audio and outputs the same audio in a streaming manner. It should possible to adapt this logic for a voice conversion / TTS demo. Here's the code:

import gradio as gr
import numpy as np
import time

def add_to_stream(audio, instream):
    time.sleep(1)
    if audio is None:
        return gr.update(), instream
    if instream is None:
        ret = audio
    else:
        ret = (audio[0], np.concatenate((instream[1], audio[1])))
    return ret, ret

with gr.Blocks() as demo:
    inp = gr.Audio(source="microphone")
    out = gr.Audio()
    stream = gr.State()
    clear = gr.Button("Clear")

    inp.stream(add_to_stream, [inp, stream], [out, stream])
    clear.click(lambda: [None, None, None], None, [inp, out, stream])

if __name__ == "__main__":
    demo.launch()

I'll go ahead and close the issue as it seems to me that this is solved, but if I'm wrong, feel free to reopen it with more details.

I am working on real-time voice conversion lately and intended to use Gradio to present a demo for my paper. However, after trying the provided code, I discovered that it doesn't support real-time audio playback, leaving me uncertain about how to play the converted audio in a streaming manner.

Do you have a solution? I am also testing real time voice to voice. I think we maybe able to use yield to stream the audio?

abidlabs commented 1 year ago

Have you taken a look at the last example on this page? https://www.gradio.app/guides/reactive-interfaces

Should allow you to do real time voice generation

prachii1910 commented 1 year ago

Sorry for the lack of follow up on this issue, but this is already possible! There's a basic example in gradio/demo/stream_audio/run.py which streams in an input audio and outputs the same audio in a streaming manner. It should possible to adapt this logic for a voice conversion / TTS demo. Here's the code:

import gradio as gr
import numpy as np
import time

def add_to_stream(audio, instream):
    time.sleep(1)
    if audio is None:
        return gr.update(), instream
    if instream is None:
        ret = audio
    else:
        ret = (audio[0], np.concatenate((instream[1], audio[1])))
    return ret, ret

with gr.Blocks() as demo:
    inp = gr.Audio(source="microphone")
    out = gr.Audio()
    stream = gr.State()
    clear = gr.Button("Clear")

    inp.stream(add_to_stream, [inp, stream], [out, stream])
    clear.click(lambda: [None, None, None], None, [inp, out, stream])

if __name__ == "__main__":
    demo.launch()

I'll go ahead and close the issue as it seems to me that this is solved, but if I'm wrong, feel free to reopen it with more details.

I am working on real-time voice conversion lately and intended to use Gradio to present a demo for my paper. However, after trying the provided code, I discovered that it doesn't support real-time audio playback, leaving me uncertain about how to play the converted audio in a streaming manner.

Do you have a solution? I am also testing real time voice to voice. I think we maybe able to use yield to stream the audio?

did yield work? I am trying to implement TTS and while using yield getting this error : "Need to enable queue to use generators." i am lost on how to work on this.

abidlabs commented 1 year ago

Hi @prachii1910 all that means is instead of doing demo.launch(), you should do demo.queue().launch(), where demo here is your Gradio Interface or Blocks

gorkemgoknar commented 1 year ago

fyi best possible solution so far we tried (with help from HF) is this https://huggingface.co/spaces/coqui/xtts-streaming real problem is audio length must be known (instead of open a stream and play while input bytestream is appended) , unfortunately with a bytestream on refresh every yield audio feels clippy (under load)

Edit: This space now uses byte streaming https://huggingface.co/spaces/coqui/voice-chat-with-mistral , unfortunately audio play/end/change events are not emitted during yield (only on output), so it has to wait audio finishes and timing is not exact (I am trying to find a way for it)