rhasspy / piper

A fast, local neural text to speech system
https://rhasspy.github.io/piper-samples/
MIT License
5.6k stars 400 forks source link

When espeak-ng translates Chinese (cmn), IPA tone symbols are not output correctly #305

Open yzznw opened 8 months ago

yzznw commented 8 months ago

Hi, dear sir. Thank your project piper! It's a good project that I can output Chinese voice. I found some issues, please check it.

When espeak-ng translates Chinese (cmn), IPA symbols are not output correctly. Here's an example:

Q1:

Z:\aaa>echo 此电脑 | espeak-ng -x -vcmn --path=.
tsh'i[213_| tiE51nn'Au213_|

In espeak-ng format symbols, it's OK. But if use IPA, it's output :

Z:\aaa>echo 此电脑 | espeak-ng -x -vcmn --ipa --path=.
tshˈi̪2 tiɛ5nnˈɑu2

so it changed 213 to 2, 51 to 5, 213 to 2.

I found the below code need to be changed if use 'cmn' : In "dictionary.c", in function "WritePhMnemonic" code segment:

    if (!first && IsDigit09(c))
           continue;

it's cause Q1 problem. If I remove this, output seems OK.

And, Q2 ( I changed the cmn.dic's tone rule, pingyin tone 1 map to IPA tone 33 ):

Z:\aaa>echo 一 | espeak-ng -x -vcmn --path=.
j'i33_|

Z:\aaa>echo 一 | espeak-ng -x -vcmn --ipa --path=.
jˈiɜ

so it changed 33 to another IPA symbol!

then I found the below code need to be changed if use 'cmn': same In "dictionary.c", in function "WritePhMnemonic" code segemnt:

            if ((c >= 0x20) && (c < 128)) {
        c = ipa1[c-0x20];
            }

if I changed it to :

            if ( ! IsDigit09(c) && (c >= 0x20) && (c < 128)) {
        c = ipa1[c-0x20];
            }

everything seem to be OK.

test:


Z:\aaa>echo 一台电脑 | espeak-ng -x -vcmn --path=.
ji51th'ai24_| tiE51nn'Au213_|

Z:\aaa>echo 一台电脑 | espeak-ng -x -vcmn --ipa --path=.
ji51thˈai24 tiɛ51nnˈɑu213

It' done!

Another problem that I feel the zh_CN_huayan_medium.onnx output voice, the tone sounds strange.

In chinese there are 5 tones by number: 1 to 5. mapped to IPA in espeak-ng is : 1(55), 2(35), 3(214), 4(51), 5(11). But in huayan model, tone 1 sometimes sounds like tone 2, and tone 4 sometimes sounds like tone 1. And the total sentence sounds strange.

I guess maybe when you train the model, used the espeak-ng to generate the IPA from label text ?

If so, can you please retrain the model after fixed the espeak-ng ? :)

colafly commented 1 month ago

@yzznw do you know the status of this bug ?

yzznw commented 1 month ago

Sorry, I don't known. I am not study piper TTS a long time. The espeak-ng in new version piper maybe existed same problem in Chinese. If you want to use TTS for Chinese, I would suggest using some TTS models from China, which are more accurate.. Like PaddleSpeech, it's good enough to output voice for Chinese.

qt06 commented 4 weeks ago

Very concerned about progress