jimholgate / readtextextension

An Apache OpenOffice & LibreOffice extension to read text with an external program
19 stars 4 forks source link

Do not describe XML encodings of punctuation! #21

Closed iam-TJ closed 2 years ago

iam-TJ commented 2 years ago

On Debian 11 amd64 with libttspico and Libre Office writer v8.0.4.2.

A basic ODT text document that contains quotation marks, apostrophes, and similar characters from the low ASCII range (below 128).

The TTS engine describes what sounds to be the XML encodings of common punctuation such as single quote (used as apostrophe), double-quotes, and other ASCII punctuation.

As in it says, for you've just literally you Ampersand Hash Thirty-Nine Semi-Colon ve just instead of youve just.

This is a problem when editing any document that uses the ASCII punctuation as most do.

I had presumed this was because under the hood the document is being passed with its XML encodings rather than passing it through a parser to remove the XML entities but further investigation with manually created strings shows something rather different.

Testing on the command line with, for example, pico2wave -l en-GB -w /tmp/test.wav "You've just been confused by Jim's add-on" pronunciation is correct :-p

That caused me to notice that /tmp/ directory is used to save both the audio file (/tmp/${USER}-speech.wav) and the text being spoken (/tmp/tts_str.txt), and I was surprised to see the regular ASCII codes are used in that file, and yet the XML entities were spoken!

Feeding that text file through pico2wave resulted in correct pronunciation so I'm now rather confused as to what is happening.

cat /tmp/tts_str.txt | pico2wave -l en-GB -w /tmp/test.wav

In the Libre Office ReadTheText configuration dialog Command line options is set to:

"(PICO_READ_TEXT_PY)" --language=(SELECTION_LANGUAGE_COUNTRY_CODE) "(TMP)"

I haven't altered any settings - this was from a new install of ReadTheText as I was researching ways to read long documents to avoid eye strain.

jimholgate commented 2 years ago

Thank you for your detailed report. This update resolves your problem.

You can now download the most recent version at the official Apache OpenOffice Extensions or the Document Foundation LibreOffice Extensions websites.