Closed pvagner closed 11 years ago
As I have assumed this appear to be easy. There is a flag which tells eSpeak parse ssml while speaking. This has been enabled for ages as is even so in the original eyes-free version. I have modified it at two locations because there are different code paths for android 2.X and android 4.X.
diff --git a/android/jni/jni/eSpeakService.cpp b/android/jni/jni/eSpeakService.cpp index 19ea3cf..145c97c 100644 --- a/android/jni/jni/eSpeakService.cpp +++ b/android/jni/jni/eSpeakService.cpp @@ -325,7 +325,7 @@ JNICALL Java_com_reecedunn_espeak_SpeechSynthesis_nativeSynthesize( espeak_SetSynthCallback(SynthCallback); const espeak_ERROR result = espeak_Synth(c_text, strlen(c_text), 0, // position POS_CHARACTER, 0, // end position (0 means no end position)
diff --git a/android/jni/jni/espeakengine.cpp b/android/jni/jni/espeakengine.cpp index 26dbfec..2606489 100644 --- a/android/jni/jni/espeakengine.cpp +++ b/android/jni/jni/espeakengine.cpp @@ -583,7 +583,7 @@ tts_result TtsEngine::synthesizeText(const char text, int8_t buffer, size_t bu
espeak_Synth(text, strlen(text), 0, // position POS_CHARACTER, 0, // end position (0 means no end position)
ooops, it is not a good idea to paste a code like this here... http://pastie.org/6470628
Looking at the nativeSynthesize
implementation in eSpeakService.cpp
, the espeak_Synth
method is being called with the espeakSSML
parameter causing the text to be handled as potentially containing SSML and HTML tags.
Using the desktop version of eSpeak (tested with 1.46.46), SSML/HTML handling appears to work as expected:
espeak -m "Hello <b>world</b>."
espeak -m "Hello < world."
espeak -m "Hello < world >."
All say world
correctly.
Some other things to note:
<b>
and <b >
are recognized, but < b>
and < b >
are not.one <two three four> five
is spoken as one five
. While this is logical for XML/HTML content, the behaviour in mixed text/SSML handled by most text-to-speech programs is problematic (e.g. a simple a<b
can cause the rest of the text to be ignored).ssmltags
array in src/readclause.cpp
). Any unrecognised tags are simply skipped. Again, while logical for XML/HTML content, this does not make sense for mixed text/SSML.<! ... >
as a comment instead of the correct <!-- ... -->
.The simplest thing to do to address this would be to remove the espeakSSML
flag, but PICO and other text-to-speech applications handle SSML in text.
The desktop versions correctly read hello when given a command
espeak -m "hello
I have pushed a change that removes espeakSSML
, similar to the patch you provided. The correct solution should be that Android informs the text-to-speech engine that the text contains SSML and the engine processes it accordingly, otherwise the engine processes it as normal text.
I am going to investigate this a bit more, including how PICO handles SSML.
Android does not support a "this is SSML" option in the API, so text-to-speech voices assume it is SSML (either with or without an explicit <speak>
tag at the start). That is, they are processing it as pseudo-XML like the desktop text-to-speech engines do.
I am not sure how good these engines are at handling XML-like content, but I assume they only recognise SSML tags and anything that is not recognised or is not a valid XML tag they treat as text. That would be the best approach in mixed text/SSML content.
Doing that would require improving the SSML processing in eSpeak itself and should be fixed in the upstream version (thus fixing the behaviour on the desktop as well).
Also note that according to the XML spec, eSpeak's detection of start tags (note 1) in my comment above is correct -- a tag is only valid if there is no space between the less-than character and the first letter of the tag name.
I must admit for me personally this has never been an issue. I am unable to find a real world use case where this breaks things right now. I have only reported this and suggested removing ssml flag because another user reported it to the eyes-free list. Should we ask him for more input or are we sticking with this ssml flag removed? Should we bring this to the espeak-general list in order to get some reasoning regarding the current implementation and try to politelly request the proposed enhancement to the ssml recognizer?
I have reported the issue to espeak-general. It would also be useful to gain more specific examples of what is broken in these cases.
I can imagine email containing code (e.g. if (x<y)
) to be broken when using the SSML flags. Also, if using ASCII-based math (such as a<b
) is broken. Not sure what else would be in the real world.
Note that Tyler's email mentions the "This is cd <path-to-project>/src
. So this does occur in real-world situations.
The problem gets more interesting when the text being passed is an example of SSML or HTML tags that eSpeak recognises. For example, a website could have "This is <b>bold</b> text." which eSpeak will handle, but won't speak the b
tags. The browser that passes this to eSpeak gets the less-than and greater-than characters escaped as < and >, so does not treat them as bold tags, but it passes them unescaped to the text-to-speech engine, so eSpeak will treat them as bold tags. Ideally, the web browser should pass these in their escaped form,
Aside from disabling SSML support for now and re-enabling when upstream add "text+SSML soup" support, there are several improvements that can be made to provide a better user experience.
espeakSSML
if the text starts with <?xml
, <speak
, <html
or <HTML
.Add a configuration option to switch SSML behaviour -- include:
a. "Text only."
b. "Mixed text, SSML and HTML content."
The behaviour should then be:
espeakSSML
flag;espeakSSML
flag;espeakSSML
flag (i.e. text-only).I have now added content detection for SSML documents. If it is an SSML document, it will be processed as such, otherwise it will be processed as text. This is sufficient (no need for a user option or upstream enhancement).
NOTE: I have also added a simple test in the main activity so you can enter text to be spoken.
This will be included in the next update.
This has been brought up by Tyler Spivey on an eyes-free email list.
Steps to reproduce:
Actual results: eSpeak would not read content between < [less] and > [greater] symbols.
Expected results: It should all be read no mather what is written there. This is how other android tts services operate.
I dont know if ssml processing should all be disabled or if characters < and > should just be escaped.