Feature request - [slovenian language]

snakers4 / silero-models

Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple

Other

5k stars 316 forks source link

Feature request - [slovenian language] #234

Open ppisljar opened 1 year ago

ppisljar commented 1 year ago

🚀 Feature

Please add support for slovenian language, here you can find a quality dataset:

audio: https://www.clarin.si/repository/xmlui/handle/11356/1776
transcriptions: https://www.clarin.si/repository/xmlui/handle/11356/1772

Artur_B_Studio (inside the dataset) contains 50 hours of a single speaker recorded in a studio (high quality). In total there are 800 transcribed hours (multiple speakers, varying quality)

for phonemizer you can use espeak-ng with "sl" language ("slovenian" voice)

Roshett commented 1 year ago

Hi! Could you share your experience how to create model for new language, please? It would be so helpful, I want to create model for Greek language and your advices can help me