rhasspy / piper

A fast, local neural text to speech system
https://rhasspy.github.io/piper-samples/
MIT License
4.38k stars 297 forks source link

Adding a new language, but getting errors in preprocessing step #465

Open isolveit-aps opened 1 month ago

isolveit-aps commented 1 month ago

I am running into similar issues as mentioned in issue #49 and issue #316 with the error: "Failed to set eSpeak-ng voice"

Anyways, I'm training a Faroese voice (https://en.wikipedia.org/wiki/Faroese_language), and I managed to manually add Faroese to the espeak-ng local installation, and it now speaks in Faroese, and is able to produce phonemes for Faroese text:

$ espeak-ng -v fo "Hey Andras, hvussu gongur hjá tær í dag?" --ipa
hˈeɪː ˈandəras
kʋˈyssy ɡˈoʊŋɡyr çaʊ taɪr ˈui dˈaːx

However in Piper, I get these errors for the processing:

$ python3 -m piper_train.preprocess   --language fo   --input-dir /home/andras/piper/datasets/andras   --output-dir /home/andras/piper/fo_outpu1   --dataset-format ljspeech   --single-speaker   --sample-rate 22050
INFO:preprocess:Single speaker dataset
INFO:preprocess:Wrote dataset config
INFO:preprocess:Processing 260 utterance(s) with 20 worker(s)
ERROR:preprocess:Failed to process utterance: Utterance(text='“Jú, ein afturat gongur nokk. Ger so væl, góði” segði omman.', audio_path=PosixPath('/home/andras/piper/datasets/andras/wavs/0000000006.wav'), speaker=None, speaker_id=None, phonemes=None, phoneme_ids=None, audio_norm_path=None, audio_spec_path=None, missing_phonemes=Counter())
Traceback (most recent call last):
  File "/home/andras/piper/src/python/piper_train/preprocess.py", line 302, in phonemize_batch_espeak
    all_phonemes = phonemize_espeak(casing(utt.text), args.language)
  File "/home/andras/piper/src/python/.venv/lib/python3.10/site-packages/piper_phonemize/__init__.py", line 38, in phonemize_espeak
    return _phonemize_espeak(text, voice, str(data_path))
RuntimeError: Failed to set eSpeak-ng voice

I bet this has something to do with the interplay between piper and espeak-ng, but I haven't been able to figure it out. The language code is fo : https://en.wikipedia.org/wiki/Faroese_language

A bit of background, if interested :)

I am currently trying to add my own voice and recordings of less than 1 hour of data, in order to test piper for faroese, but there is also an open source dataset of 100 hours of speech, across 433 speakers, so there is quite a lot of data available, if I can get the training to work. The weakness of those datasets is perhaps that no individual speaker has much more than ½ hour of recordings. https://mtd.setur.fo/en/resource/ravnur-blark-1-0/

There is also a smaller, and differently structured dataset, based on the same recordings, that was used to train the ASR model used in the VoisIT app (Android/App Store - which is my personal project), and that dataset is available here: https://repository.clarin.is/repository/xmlui/handle/20.500.12537/276 Trained ASR model and other models are also available: https://huggingface.co/carlosdanielhernandezmena?search_models=faroese (Credits to these two teams above, for Faroese gathering of data and model training "Ravnur" and "Ravnursson").

atabekm commented 3 weeks ago

I had the same problem. As we can see from the stack trace, the issue is with piper_phonemize package. If you browse to /home/andras/piper/src/python/.venv/lib/python3.10/site-packages/piper_phonemize/ folder, you can see espeaker-ng-data folder, which I suppose is based on espeaker-ng project in github. It doesn't contain the custom language you locally added, so you can copy espeak-ng-data folder from your local version of espeak-ng to /home/andras/piper/src/python/.venv/lib/python3.10/site-packages/piper_phonemize/. This should solve this issue, at least it did for me.