thorstenMueller / Thorsten-Voice

Thorsten-Voice: A free to use, offline working, high quality german TTS voice should be available for every project without any license struggling.
http://www.thorsten-voice.de
Creative Commons Zero v1.0 Universal
548 stars 52 forks source link

Abbreviation (bzgl.) pronounced wrong (but works with espeak-ng) #69

Open GithubAnon0000 opened 2 months ago

GithubAnon0000 commented 2 months ago

Hello again!

I am using piper with the thorsten (high) voice. I wanted to see if it's possible to pronounce "bzgl." correctly without having to use a separate string that says "bezüglich". But it always speaks a long pause with your voice, where with espeak it works fine.

Maybe you've got an idea?

Steps to reproduce

  1. Edit the espeak dictionary by adding the following into de_extra file: bzgl b@ts'y:klIC $dot
  2. Compile the dictionary and copy it to the .dict file that piper uses: sudo espeak-ng --compile=de && cp /usr/lib/x86_64-linux-gnu/espeak-ng-data/de_dict ../TTS/espeak-ng-data/de_dict
  3. Use echo "Ich habe Fragen bzgl. Ihrer Rückmeldung." | ./piper --model ./de_DE-thorsten-high.onnx --output-file ../OUTPUT/text.wav for the audio generated with your voice model.
  4. Use espeak-ng "Ich habe Fragen bzgl. Ihrer Rückmeldung." -v German --stdout > ../OUTPUT/text_espeak.wav to generate the same audio with espeak.
  5. Compare the results: OUTPUT.zip

The voice obviously is different but so is the pronounciation. A workaround is to just use "bezüglich" instead of "bzgl.".

Expected Behavior

The pause after "bzgl." shouldn't be there.

Actual behavior

The pause is there.

Other things tried

According to espeak dictionary docs I tried the following alternatives one by one:

bzgl    b@ts'y:klIC $dot
bzgl    b@ts'y:klIC $hasdot
bzgl    bezüglich $text $dot
bzgl    bezüglich $text $hasdot

None where successfull though with the thorsten voice. Adding a dot after bzgl made it worse, even in espeak:

bzgl.   b@ts'y:klIC $dot
bzgl.   b@ts'y:klIC $hasdot
bzgl.   bezüglich $text $dot
bzgl.   bezüglich $text $hasdot

Version info

piper: 1.2.0 OS: Debian oldstable (gnome 3.38.5, X11) python: 3.9.2

GithubAnon0000 commented 2 months ago

I'll have to learn more about how the model had been trained (and how piper uses the model), since I came to the conclusion that the model itself is somehow doing that.

It happens with normal words and sentences too. The same sentence is not pronounced the same way, even though espeaks dictionaries are quit deterministic. Running piper with --debug actually shows the phonemes (just like espeak-ng --ipa). They are identical.

Judging on that, the model probably has some sort of variance for some reason. I'll have to learn more about it first but I believe the way the model had been trained has something to do with it, since AI tends to do things like that (and you trained it using coqui). Maybe it's more or less easily fixable (since I'd prefer deterministic output if possible). It's low priority for me though.

thorstenMueller commented 2 months ago

First of all thank you for your great and detailed description 👍.

One idea might be to clean the text before tts processing using e.g. https://github.com/repodiac/german_transliterate . Is this $dot at the end of the adjusted dictionary required? Maybe that's a reason for the break, which is meant to be after a dot character.

I tried your sentence on my huggingface spaces. Piper space: Ich habe Fragen bezüglich. Ihrer Rückmeldung. has a break after bezüglich.. Ich habe Fragen bezüglich. Ihrer Rückmeldung. is sounding good, without a break as the espeak speech flow.

My trained Coqui models have (as expected that break) too when a dot after bezüglich. is added.

So i'm not sure if that $dot at the end of the adjusted dictionary has something to do with that.

GithubAnon0000 commented 2 months ago

My trained Coqui models have (as expected that break) too when a dot after bezüglich. is added.

Yes, but they shouldn't. At least if you use the actual abbreviation like outlined in the "steps to reprocude" parts. → Not "…bezüglich. …", but "… bzgl. …".

Is this $dot at the end of the adjusted dictionary required? Maybe that's a reason for the break, which is meant to be after a dot character.

The $dot basically says that the word "bzgl." has a dot but isn't supposed to be spoken with a break after that dot. It works fine with espeak, but not with piper and your model. I'm now guessing that the training (with ai) never learned about abbreviations and thus always assumes it should read a break after a dot (which in case of "bzgl.", it shouldn't).

One idea might be to clean the text before tts processing

Yes, that's what I'm currently doing (although with my own bash script). It works, since all I have to do is changing abbreviations like "bzgl.", "z. B." ect. to their long form ("bezüglich", "zum Beispiel"). Since this works, this issue is low priority for me as stated above. But if I could adjust the model or dictionary files someone so that preprocessing becomes redundant, this would be great.

Thanks for your time and looking into it!