Closed Whyki closed 6 months ago
Thank you very much for your kind words :)
I tried to train it, but it seems to train very slowly I see 0.5 to 1.6 iterations per second. At the same time I see neglectable GPU utilization. Is that to be expected? (Running on A100 GPU)
I expect that yes, and by looking at the number of parameters the model is quite small one, so I would assume the speed is slow because of CPU based MAS computation.
Precisely! Its the MAS that is taking the most time, what you can do is once it is reasonably trained save the alignments and then train using those alignments. (I do it too for other purposes works like a charm)
Are you aware of any reason why using alignment implemented by FastPitch team would not work? The implementation in NVidia Deeplearning examples is very snappy. https://arxiv.org/abs/2108.10447
It would work just fine! I tried it but at least on my consumer grade GPU it was giving similar results. So I stuck with the easier codebase but feel free to replace it and post your experience.
Hope it helps!
Thanks for such a quick repsonse! I'll try the aligner and share my results.
Hello! I have added the source code of this and added documentation around it! Hopefully it will help :)
Kind Regards, Shivam
First of all thanks for very nice TTS system. This is very interresting and inspiring system.
I tried to train it, but it seems to train very slowly I see 0.5 to 1.6 iterations per second. At the same time I see neglectable GPU utilization. Is that to be expected? (Running on A100 gpu)
I expect that yes, and by looking at the number of parameters the model is quite small one, so I would assume the speed is slow because of CPU based MAS computation.
Are you aware of any reason why using alignment implemented by FastPitch team would not work? The implementation in NVidia Deeplearning examples is very snappy. https://arxiv.org/abs/2108.10447
Thanks for your great work and insights!