New language from scratch

Hello everyone,

I've recently begun experimenting with TTS, and in order to learn more about it, I'm eager to incorporate my native language from the ground up. From what I understand, having a substantial amount of data sets is crucial for achieving optimal results. While starting with 7-8 hours would be a good foundation, aiming for around 24 hours seems ideal (data sourced from LJ Speech, please correct me if I'm mistaken).

Before delving into using an actor's voice, I'm considering using my own voice for a preliminary test to gauge its effectiveness. Would recording around 100 sentences be sufficient for this purpose, with the expectation of exporting a single word regardless of quality (whether it sounds robotic or realistic)? If not, what would you recommend as the minimal data set required to generate one word that's included in the data set (I would record it using piper recording studio)?

I understand that it's generally preferable to train models based on existing ones, but since my native language isn't currently supported, I'm opting to start from scratch.

Thank you for your insights.

rhasspy / piper

New language from scratch #452