Open Devloper-RG opened 1 week ago
Hey @Devloper-RG, On what device are you running the pipeline?
@eustlb I'm running the server on a Google Cloud Platform (GCP) VM with 2 NVIDIA T4 GPUs, and the client is on my local machine.
I never tried this setup, there are two possibilities for choppy audio:
reduce-overhead
and max-autotune
) with Parler-TTS + streaming that could make it work on a T4. Here what you can try is increase play_steps_s
→ you'll increase latency yet you'll also reduce the number of DAC decoding steps What you can do is switching from Parler-TTS to MeloTTS by setting the --tts melo
flag. You'll loose text-to-speech generation streaming that will increase latency, but also remove this point as a possibility for choppy audio
play_steps_s
until you do not experience choppy audio anymore.Also, can you give me the command you're running?
Also, beware that I don't think the code uses multiple gpus yet. So 2 T4s is the same as 1.
@eustlb
I'll implement the solutions you suggested and will update you if they work or if I find the underlying issue. As requested, here are the terminal commands I used:
Client side: python listen_and_play.py --host <IP address>
Server side: python s2s_pipeline.py --recv_host 0.0.0.0 --send_host 0.0.0.0
I'm experiencing issues with breaks in the generated voice output, seemingly caused by latency in the text-to-speech (TTS) conversion process. The audio output has occasional breaks, which disrupt the flow of speech.
Steps I've tried:
Decreasing block size: This helped reduce some latency in delivering TTS audio output, but the issue persists. Adjusting play_steps_s: I've decreased this parameter to minimize latency. However, setting play_steps_s below 0.5 causes errors, so I’ve kept it at 0.5 for now.
Any suggestions on how to further reduce the latency and improve the smoothness of the audio output would be greatly appreciated.