huggingface / parler-tts

Inference and training library for high-quality TTS models.
Apache License 2.0
4.69k stars 476 forks source link

streaming smooth #124

Open cesinsingapore opened 3 months ago

cesinsingapore commented 3 months ago

i tried to create streaming sample here

https://gist.github.com/cesinsingapore/5147b0fcd63ba6aa22e0faa1a52ba249

but the problem is there is not smooth enough playing between chunks, anyone can make it smooth?

haixuanTao commented 3 months ago

So, if you hear sudden blank, it means that the generation is not fast enough to play it in one go. You could try to either:

cesinsingapore commented 3 months ago

So, if you hear sudden blank, it means that the generation is not fast enough to play it in one go. You could try to either:

  • Wait before generating the first audio and correlate the wait time with the generation length
  • Try to improve generation with things like: torch.compile()

I'm not experiencing sudden blank, but between chunk generated there is like a static signal voice(just call it like a noise), idk what is it in english, is that same reason as the first point ?

haixuanTao commented 3 months ago

Yeah, not too sure. It could be coming from sounddevice. Could you try to record the audio in an audio file to see if it is still there when you play it ?

I switched to PyAudio for streaming on first audio. Maybe it could help.

FYI, 我也能说中文