rhdunn / espeak

eSpeak NG is an open source speech synthesizer that supports 101 languages and accents.
http://reecedunn.co.uk/espeak-for-android
GNU General Public License v3.0
385 stars 16 forks source link

Fail to set Cantonese #42

Closed hgneng closed 11 years ago

hgneng commented 11 years ago

I've tried the latest code (03dface) on emulator of Android 4.1. I fail to switch to Cantonese. When I play sample text, it speaks "This is a sample text in" with the last Chinese word. The way it handles English is like Mandarin but with higher speech. I should have put Cantonese dictionary in espeakdata.zip.

What I am glad to see is that it can run on emulator and I fail to run it with eyesfree's code.

pvagner commented 11 years ago

As per the issue #7 additional data for cantonese are not yet included. Is this only the issue or cantonese voice is not loading even if you add proper dict_source file before building?

hgneng commented 11 years ago

I have built espeakdata.zip with Cantonese dictionary before (but it's version 1.46.02). I am not sure whether it's a version issue but not likely because other language including Mandarin works well.

I will try to debug the issue and provide more info (some days later when I have time). I think the problem arrise at setLanguage.

hgneng commented 11 years ago

Here is a line from logcat when speak sample text in Cantonese: 12-28 17:55:55.460: V/eSpeakService(635): Java_com_reecedunn_espeak_SpeechSynthesis_nativeSetVoiceByProperties(name=(null), languages=nci)

The language code is "nci". What is it?

rhdunn commented 11 years ago

The nci language code if for Classical Nahuatl which is a language that espeak supports. That, along with prs (Afghan Persian) are not translated by Android to their localised names (you see them as the language codes in the set language UI). See http://www.iana.org/assignments/language-subtag-registry for language codes and their language names.

In the Voice constructor [https://github.com/rhdunn/espeak/blob/android/android/src/com/reecedunn/espeak/SpeechSynthesis.java] I am setting the Cantonese voice to 'yue' instead of 'zh-yue'. This is because Abdroid/Java does not recognise macrolanguages (that is, for zh-yue zh/Chinese is a mcrolanguage and yue/Cantonese is the language; on Android, yue gets displayed as Cantonese but zh-yue is displayed as Chinese (YUE) as Android thinks yue in this case is a region code and as it does not recognise the region code it displays it in upper case).

I am not sure why nci is being passed instead of yue. NOTE: I have not got the language changing when you set the system locale, or added tests to verify the correct behaviour. At the moment you need to go into the TTS settings > eSpeak > Set Language UI and select Cantonese there. That should set it correctly.

Android resources use the 2 letter locale codes. They don't support/use the Java 3 letter codes or the additional 3 letter language codes in the IANA language subtag repository. This means that yue is not supported. What I am currently doing is mapping yue to zh, so it should pick up the Chinese data (same for zho to zh). It should be failing to pick up the Chinese text because the code is not specifying the country code (CN, HK, etc.) or script (Hans or Hant).

I don't know if that is the case.

NOTE: I have the locale logic tested in the eSpeakTests project (which you can run as an Android JUnit project from eclipse). That might help track down the issues.

hgneng commented 11 years ago

At the moment you need to go into the TTS settings > eSpeak > Set Language UI and select Cantonese there. I don't quite understand. When I mentioned "fail to set Cantonese", I always meant set it in eSpeak setting UI not detecting automatically by system. In another word, I have no way to switch to Cantonese.

rhdunn commented 11 years ago

Thanks for the clarification.

The only thing I can think of is that the language codes yue, nci and prs all result in a blank locale code when Locale.getISO3Language is called, which is what Android calls in some cases from what I understand. This makes sense, given that nci is the first locale with a blank ISO3 language in the list. That means that the Android TTS settings UI must be comparing Locale.getISO3Language, not Locale.getLanguage.

Therefore, there are two changes that are needed here:

  1. Use zh-MO -- Chinese (Macau) -- for Cantonese. NOTE: zho-yue cannot be used as yue is only recognised as Cantonese when used as a language code (Android does not handle extlang codes).
  2. Do not add a voice if the Locale.getISO3Language call is blank. This will cause nci and prs to be excluded until those codes are added to the Android/Java locale data. To get this working, in https://github.com/rhdunn/espeak/blob/android/android/src/com/reecedunn/espeak/SpeechSynthesis.java, the special locale fixes in the Voice constructor need to be moved to the getAvailableVoices method call.

Does using (1) fix this issue for you?

hgneng commented 11 years ago

What/where shall I change to zh-MO? eSpeakSupportedVoices in espeakengine.cpp? I grep no "zho-yue" in the souce tree. I have tried to change "zho-HKG" to "zh-MO" but I got the same result of nci issue.

rhdunn commented 11 years ago

espeakengine.cpp is for the pre-Android 4.0 support which I have not got working properly yet and am thinking of dropping as it is too complex to maintain both implementations with associated bug fixes.

Go to android/src/com/reecedunn/espeak/SpeechSynthesis.java, line 240-242 and change:

        } else if (name.equals("zh-yue")) {
            // Android/Java does not support macrolanguages.
            locale = new Locale("yue");
        } else {

to

        } else if (name.equals("zh-yue")) {
            // Android/Java does not support macrolanguages.
            locale = new Locale("zh", "MO");
        } else {

That should report Cantonese as "Chinese (Macau)". Is there a better country code for Cantonese?

hgneng commented 11 years ago

It works! Thank you! I prefer to use "HK" for Hong Kong.

rhdunn commented 11 years ago

This has been implemented in the latest git code. Thanks for reporting and investigating this.