rhasspy / piper

A fast, local neural text to speech system
https://rhasspy.github.io/piper-samples/
MIT License
4.38k stars 297 forks source link

New language from scratch #452

Open codevladimir opened 1 month ago

codevladimir commented 1 month ago

Hello everyone,

I've recently begun experimenting with TTS, and in order to learn more about it, I'm eager to incorporate my native language from the ground up. From what I understand, having a substantial amount of data sets is crucial for achieving optimal results. While starting with 7-8 hours would be a good foundation, aiming for around 24 hours seems ideal (data sourced from LJ Speech, please correct me if I'm mistaken).

Before delving into using an actor's voice, I'm considering using my own voice for a preliminary test to gauge its effectiveness. Would recording around 100 sentences be sufficient for this purpose, with the expectation of exporting a single word regardless of quality (whether it sounds robotic or realistic)? If not, what would you recommend as the minimal data set required to generate one word that's included in the data set (I would record it using piper recording studio)?

I understand that it's generally preferable to train models based on existing ones, but since my native language isn't currently supported, I'm opting to start from scratch.

Thank you for your insights.

luiscarlos2000 commented 1 month ago

Not the developer I am, but just a question: hat new language do you want to be supported on Piper? I want to have more info about it. It's a very interesting project I see.