Closed AnticPan closed 2 weeks ago
Hi! Thank you for reporting the slowness. The most of the inference latency comes from the use of BigVGAN vocoder. If you are on a linux machine with nvidia gpu you can try setting BigVGAN to use cuda kernels. Currently it is set to False to maintain compatibility with other hardware. I am planning to release a new, smaller and more capable model soon, so hopefully it can reduce latency as well.
Hi there,
I've noticed that processing a 2-minute audio file takes about 3.5 minutes with the current default settings in
inference_long.py
. I'm wondering if there are any plans to optimize the speed. For reference, the https://github.com/SWivid/F5-TTS project, which also uses matching flow, achieves an inference RTF of 0.15.Thank you for your hard work, and I appreciate any insights you can provide!