shivammehta25 / Matcha-TTS

[ICASSP 2024] 🍵 Matcha-TTS: A fast TTS architecture with conditional flow matching
https://shivammehta25.github.io/Matcha-TTS/
MIT License
747 stars 95 forks source link

Matcha-TTS has very low GPU utilization. #73

Closed Whyki closed 6 months ago

Whyki commented 6 months ago

First of all thanks for very nice TTS system. This is very interresting and inspiring system.

  1. I tried to train it, but it seems to train very slowly I see 0.5 to 1.6 iterations per second. At the same time I see neglectable GPU utilization. Is that to be expected? (Running on A100 gpu)

  2. I expect that yes, and by looking at the number of parameters the model is quite small one, so I would assume the speed is slow because of CPU based MAS computation.

  3. Are you aware of any reason why using alignment implemented by FastPitch team would not work? The implementation in NVidia Deeplearning examples is very snappy. https://arxiv.org/abs/2108.10447

Thanks for your great work and insights!

shivammehta25 commented 6 months ago

Thank you very much for your kind words :)

I tried to train it, but it seems to train very slowly I see 0.5 to 1.6 iterations per second. At the same time I see neglectable GPU utilization. Is that to be expected? (Running on A100 GPU)

I expect that yes, and by looking at the number of parameters the model is quite small one, so I would assume the speed is slow because of CPU based MAS computation.

Precisely! Its the MAS that is taking the most time, what you can do is once it is reasonably trained save the alignments and then train using those alignments. (I do it too for other purposes works like a charm)

Are you aware of any reason why using alignment implemented by FastPitch team would not work? The implementation in NVidia Deeplearning examples is very snappy. https://arxiv.org/abs/2108.10447

It would work just fine! I tried it but at least on my consumer grade GPU it was giving similar results. So I stuck with the easier codebase but feel free to replace it and post your experience.

Hope it helps!

Whyki commented 6 months ago

Thanks for such a quick repsonse! I'll try the aligner and share my results.

shivammehta25 commented 6 months ago

Hello! I have added the source code of this and added documentation around it! Hopefully it will help :)

https://github.com/shivammehta25/Matcha-TTS/wiki/Improve-GPU-utilisation-by-extracting-phoneme-alignments

Kind Regards, Shivam