huggingface / parler-tts

Inference and training library for high-quality TTS models.
Apache License 2.0
4.17k stars 409 forks source link

Stream and play #114

Open RobinWitch opened 3 weeks ago

RobinWitch commented 3 weeks ago

It's so appealing for me to use streaming.

for (sampling_rate, audio_chunk) in generate(text, description, chunk_size_in_s):
  # You can do everything that you need with the chunk now
  # For example: stream it, save it, play it.
  print(audio_chunk.shape) 

But I'm sorry, I'm a beginner. This is my first time learning about and wanting to use the stream feature. Could you provide a specific example of streaming it and playing it?"

tsdocode commented 3 weeks ago

I have a gradio example for it: https://gist.github.com/tsdocode/6be4bc8d5c63321d933d404adbfdfa7a

RobinWitch commented 3 weeks ago

Thank you very much, it helps me a lot !!!

I have a gradio example for it: https://gist.github.com/tsdocode/6be4bc8d5c63321d933d404adbfdfa7a

cesinsingapore commented 1 week ago

this is my example to make it more smooth, but i still heard static between the chunks, if someone can enhance it to play it smoother please make a revision https://gist.github.com/cesinsingapore/5147b0fcd63ba6aa22e0faa1a52ba249

tsdocode commented 1 week ago

This small script help add crossfade and avoid static noise between chunk

    combined_audio = AudioSegment.silent(duration=100)

    # Process each audio chunk and apply crossfade
    for chunk in audio_chunks:
        audio_segment = AudioSegment.from_raw(
            io.BytesIO(chunk), sample_width=2, frame_rate=44100, channels=1
        )

        if len(combined_audio) > 0:
            chunk_length_ms = (len(chunk) / 44100) * 1000
            crossfade_duration = int(0.1 * chunk_length_ms)
        else:
            crossfade_duration = 0

        combined_audio = combined_audio.append(audio_segment, crossfade=crossfade_duration)