Open Moonbase59 opened 2 years ago
Thanks a lot for this.
In the future I plan to switch to speech-dispatcher (which has a Pico module, I think). And more importantly Foliate needs to properly parse and extract the contents of the book, including any SSML markup (though I don't know how many books actually include those). See #829. Then we can also keep our own set of default pronunciation tweaks if no pronunciation info is included in the book.
You’re welcome! Trying spd-say, I couldn’t find a PicoTTS option, but I didn’t really look very closely. The standard spd voices sound rather robotic on my system. SSML might be a nice option, though I think almost no one uses it.
As you see in my script, PicoTTS also has scripting options (which I use heavily in my home automation). Too bad they were bought and put in the drawer… they had great voices, back in the days.
Should you switch over to spd—or something else—please don’t remove the scripting possibility! There’s still much to be gained when writing some adaptations (just check sound variation and some pronunciation help I add by "brute-force" sed). This gives Foliate a real advantage. (I’m also using Calibre’s reader which can only handle PicoTTS unmodified, and it’s much worse.)
Is there a reason that there are no linebreaks (for paragraph separation, which PicoTTS uses) and changing back ndashes, mdashes, ellipses to their ASCII equivalents? And the many semicolons added? All these I had to undo again to make it pronounce better.
Is there a reason that there are no linebreaks [...] And the many semicolons added?
As I mentioned, Foliate currently does not parse and extract content properly. By "not properly" I mean that it uses Range.toString()
(of the DOM API), which preserves all whitespace from the text nodes (I think), which means it's very much possible to have zero whitespace between paragraph elements, and newlines in the source will be preserved even though they aren't supposed to be rendered.
Another problem is that it speaks each page separately so that Foliate can turn to the next page when it finishes speaking. This approach obviously has many problems.
So this is mainly what I want to change. For example, it could process the document and insert linebreaks at block element boundaries. That would be much better than Range.toString()
. If the TTS program supports marks or other kinds of events, then Foliate should feed the whole page or element to the TTS program, and use marks to handle highlighting and page turning.
Speech-dispatcher is unrelated to all issues above. I want to switch to that for different reasons.
The first is that I do not want to reinvent the wheel. Currently Foliate is already sort of a very poor man's speech-dispatcher. It has the advantage of having a much, much simpler interface, but it lacks features such selecting different voice, speed, etc.
The second is security. In a sandbox environment, ideally you don't want to allow Foliate to run arbitrary commands outside the sandbox. Speech-dispatcher is itself configurable and extensible, so there's should be no significant loss of customizability if we limit access to only speech-dispatcher in the sandbox.
The last reason is that it is already used by many other apps such as Firefox or Chromium. So in a sense it might make things easier for users (no need to configure different apps separately).
But really, Foliate should not even care or know about TTS programs. Ideally it should just use the SpeechSynthesis Web API. It would help make Foliate's code more reusable and portable as the Web API can be run on any browser on any platform. Unfortunately that's not supported by WebKitGTK, which is ideally where all this TTS code should live, where it would also benefit other WebKitGTK apps like Epiphany. So that is why I wrote in the other issue that while it would use speech-dispatcher, we should still use the SpeechSynthesis API and only defer to speech-dispatcher under the hood.
Should you switch over to spd—or something else—please don’t remove the scripting possibility! There’s still much to be gained when writing some adaptations (just check sound variation and some pronunciation help I add by "brute-force" sed).
I do understand the value in that, but really it's more of a by-product of the fact that TTS support in Foliate is extremely barebones. You can even abuse the TTS command to launch other non-TTS programs, for example. But that's not really how it's meant to be used.
Design-wise speaking, this is no different from injecting userstyles or userscripts to modify the content of the book. So ideally, if this kind of scripting is to be supported by Foliate, it should be done properly with a proper plugin or userscript API.
Also it could be argued that for forcing a certain pronunciation, one should be able to configure it in the TTS program, rather than doing it specifically for Foliate (provided that the content extraction issues mentioned above are fixed in Foliate).
All your points are valuable—and correct. Let’s see how it eventually evolves, looking forward to it!
And yes, of course I’m brute-forcing a lot here, because TTS on Linux is still not too great, and we sadly won’t get any more development on PicoTTS.
Thank you for Foliate and the script which works perfectly with Foliate !! Could you tell me how to find phonetic for adding some words for the french language. For instance, "Windows" ( I'm planing a demonstration with Linux). I wrote this but it failed :
ext=$(echo "$text" | sed 's|\bwindows\b|Win doz|gI')
Thanks a lot !
@Lume6: It may be possible I didn’t check out all languages, resulting in the file /tmp/foliate-sox.wav
missing. You can try a simple text on the command line like so:
FOLIATE_TTS_LANG_LOWER='fr'; echo "J'utilise Windows 10." | foliate-picotts
If you get something like an open()
error and a message that it can’t delete /tmp/foliate-sox.wav
then it’s my fault… sorry for that.
Change this part in the script to have sox commands for all languages as follows:
# use sox to make output better understandable (voices are rather muffled)
# adding some treble in the range of +3 to +6 dB helps
# some voices might need a little bass reduction, use s/th like "bass -6 400"
# to avoid clipping, give headroom (gain -h) and reclaim afterwards (gain -r)
case "${FOLIATE_TTS_LANG_LOWER:0:2}" in
"de")
sox /tmp/foliate.wav /tmp/foliate-sox.wav gain -h treble +6 gain -r
;;
"en")
sox /tmp/foliate.wav /tmp/foliate-sox.wav gain -h treble +3 gain -r
;;
"fr")
sox /tmp/foliate.wav /tmp/foliate-sox.wav gain -h treble +3 gain -r
;;
"it")
sox /tmp/foliate.wav /tmp/foliate-sox.wav gain -h treble +3 gain -r
;;
"es")
sox /tmp/foliate.wav /tmp/foliate-sox.wav gain -h treble +3 gain -r
;;
*)
cp /tmp/foliate.wav /tmp/foliate-sox.wav
;;
esac
(You can adjust the treble +3
to whatever is appropriate for French.)
For "Windows" (the operating system), it might even be better to use X-SAMPA phonemes, which PicoTTS supports. Try something like:
"fr")
lang="fr-FR"
# "Windows" (the operating system)
text=$(echo "$text" | sed 's|\bwindows\b|<phoneme ph=\"win.doz\"/>|gI')
;;
Sounds like: foliate-sox.wav.zip
Happy experimenting!
Updated version of the script: foliate-picotts.zip
Try:
FOLIATE_TTS_LANG_LOWER='fr'; echo "Je préfère Linux à Windows." | foliate-picotts
;-)
Meanwhile, should be possible to use gTTS, but echo "$text" | gtts-cli - -l $FOLIATE_TTS_LANG_LOWER | play -q -t mp3 - -t alsa
doesn't seems to work.
BTW, should be mention that this is a ISO 639-1 language code, not a three-letter 639-3 code (like used e.g. by Tesseract).
Hi !
A reinstall of Linux made me lose foliate-picotts. The new installation of the program from the foliate-picotts.zip archive fails with explanation I can't use.
Here is the output of ./foliate-picotts ( I used the parameters -vx #!/bin/bash -vx)
:
bernard@bernard:~/apps$ ./foliate-picotts
#!/bin/bash -vx
# foliate-picotts -- Speak Foliate e-book using PicoTTS and PulseAudio
#
# Requirements:
# - pico2wave -- sudo apt install libttspico-utils
# - paplay -- Most modern systems now use PulseAudio for output
# - sox -- sudo apt install sox
# - sed -- POSIX standard command
#
# Use F5 within Foliate to start/stop speech.
#
# 2021-11-06 -- Matthias C. Hormann aka Moonbase59
# - Code cleanup, bugfixing, added sox post-processing for better understanding.
# - Added hypenation (en-dash, em-dash) support.
# 2021-11-09 -- Matthias C. Hormann aka Moonbase59
# .......
# 2022-01-01 -- Matthias C. Hormann aka Moonbase59
# - Final adaption for Foliate as TTS output script.
# - Added F5 start/stop handling (SIGINT to script by Foliate)
# 2022-02-13 -- Matthias C. Hormann aka Moonbase59
# - Add sox commands for FR, IT, ES, to prevent error (missing /tmp/foliate-sox.wav).
# - Add French "Windows" pronunciation (thanks @Lume6!).
text=$(cat) # get text from stdin into text buffer
++ cat
As you notice, it stops line withe the ++ cat
message:
text=$(cat) # get text from stdin into text buffer
Who would have an explanation?
Thank you
Good evening, In fact, I had reinstalled the distribution and I had forgotten that I had to copy the script or create a symbolic line to /usr/local/bin, for example. It works very well. Thanks again to you!
The GTK 4 version now uses speech-dispatcher exclusively.
Probably one can still add back the scripting ability. But it should have a better interface that works similarly to how it currently works with speech-dispatcher:
Is your feature request related to a problem? Please describe. I wanted to add offline TTS (Text-to-Speech), and I’m not happy with eSpeak or Festival, but use PicoTTS for many other things already (it supports EN, DE, FR, IT, ES).
Describe the solution you'd like Asssuming that most modern Linux systems already have sox and use PulseAudio, I wrote a little output script to be used with Foliate. Just copy into your
~/bin
folder or another appropriate location and make it executable (chmod +x foliate-picotts
).Describe alternatives you've considered eSpeak, gTTS
Additional context
Here is my script – feel free to include it with your software and/or website!