rhasspy / piper

A fast, local neural text to speech system
https://rhasspy.github.io/piper-samples/
MIT License
4.38k stars 297 forks source link

performance of medium vs high quality #431

Open bzp83 opened 2 months ago

bzp83 commented 2 months ago

Hello! I trained 2 voices from scratch, one in medium and the other in high quality.

When I export them to onnx and test, the medium has a RTF of around 0.09 which is very fast, however, the high quality one has a RTF of around 0.55 which is a lot slower and I really don't see any difference in quality.

Is this expected?

I'm running it on windows...

thanks!

synesthesiam commented 2 months ago

I typically train the high quality voices at a higher sample rate as well, which contributes to it sounding better. For some datasets though, there isn't going to be a major difference.