otsaloma commented 7 years ago

Either

espeak --stdout "Some text" | gst-launch-1.0 -q fdsrc ! wavparse ! audioconvert ! pulsesink

or

flite -t "Some text" -o /tmp/out.wav && gst-launch-1.0 -q filesrc location=/tmp/out.wav ! wavparse ! pulsesink

http://talk.maemo.org/showthread.php?p=1520041#post1520041

otsaloma commented 7 years ago

Something else to consider as well: Amazon Polly

24 languages
Python and HTTP APIs
Reasonable pricing: "Pay-as-you-go $4.00 per 1 million characters"
Test page: https://console.aws.amazon.com/polly/home/SynthesizeSpeech (you need an account to try it)

bonanza123 commented 7 years ago

Not sure if really useful and/or new to you, but it seems to be the TTS software that still receives updates. Praat http://www.fon.hum.uva.nl/praat/download_linux.html http://www.fon.hum.uva.nl/praat/manual/Scripting_6_9__Calling_from_the_command_line.html

rinigus commented 7 years ago

@otsaloma, just to let you know: I started working on it and I hope to have the first version ready relatively soon.

otsaloma commented 7 years ago

All right, good. When I looked into this, flite sounded a lot better than espeak. I'd like to see the only the best option supported, but we probably need to support different engines for different languages, or offline vs. online. This also means that we need to know which language the navigation instructions are in -- routing providers probably need to provide that information along with the narration.

rinigus commented 7 years ago

The first raw prototype worked already (and tested while driving around). I'll polish it a bit and submit as WIP PR, so we can discuss whether its a way that you would imagined as well.

Right now, I used espeak since it was already on my device. But, its rather trivial to support any of the possible options since espeak/flite or anything else are generating WAV file which is played back separately. We can just make the preference list taking into account language selection and availability.

rinigus commented 7 years ago

I have worked on it quite a bit and, after fixing few niggles, would submit as a WIP PR. I also looked into available TTS solutions and, it seems, that the best are mimic (flite-based) and picotts. The both TTS programs are packaged and available via openRepos.

Would polish my work a bit more and, hopefully, would be able to submit it for discussion at the end of the weekend. However, please note its a tentative plan and I am not sure how much time I will have this weekend.

otsaloma commented 7 years ago

No hurry on my side, I plan to release 0.32 with rerouting and that still has the narrative page redesign and some testing left, so I think it'd be at least two weeks before I would look at the voice navigation.

rinigus commented 7 years ago

Good plan. Then I will write up the description of implementation and maybe I should submit PR after you finished and merged reroute work (voice is on the top of that). It will take some time to write the description as well :)

rinigus commented 7 years ago

I think that the code is ready now for PR submission, discussion and review. I have been testing all the code against OSM Scout Server / Valhalla and the current code supports Mapzen and OSM Scout Server / Valhalla. The code was so far developed at https://github.com/rinigus/poor-maps/tree/voice . I will submit it as PR when reroute is merged. Here, I outline what has been done. As you are busy with the reroute, I don't expect any fast response. However, its better for me to submit this description while I remember all the details.

Background

With the voice commands, we are having several restrictions. Namely, voice commands are not available in all languages and, it is probably common, user may want to specify the voice command language. Second, the voice synthesis availability is also rather poor. There are several options, more about it below. Third, when using better quality voices, synthesis may take some time. The developed code is able to handle these limitations, as much as I could.

Text to speech

After some search, I think we have three packages providing TTS in SFOS: mimic (based on flite), picotts, and espeak. Out of these, mimic supports English only, picotts has few other languages (de, es, fr, and it), and espeak has more. Quality of espeak is, though, rather poor. All packages are available at OpenRepos (mimic and picotts I have uploaded myself from OBS).

To handle these options, I made a class VoiceEngineBase that is later used as a base class by voice engines: VoiceEngineMimic, VoiceEngineFlite (supports flite if you prefer), VoiceEnginePicoTTS, and VoiceEngineEspeak. These voice engines are used by VoiceCommand that handles

selection of the voice engine according to given language by searching an list of engines sorted by quality (subjective)
getting requests for new voice commands (prompts) and asking voice engine to make it asynchronously
giving a file with the voice corresponding to the command
handling a cache of voice commands

Voice prompts

Voice prompts are given through Narrative and its Maneuver. Audio is handled through QML Audio by giving the audio file. When compared to earlier Maneuver, we have now to store and play verbal instructions. In Valhalla, its alert (like 200-300 meters before), pre-maneuver (turn left now), and post-maneuver (continue for 1 km). So, the routing engine (Valhalla and others) would have to provide these data. If only narrative is provided, its going to be used for alert and pre-maneuver prompt.

When compared to earlier Narrative, the proposed version has to define current_maneuver (to know when to play post-maneuver), interface with VoiceCommand, and remember which prompts have been voiced already (at present, the corresponding prompt is just deleted from a current maneuver copy after the prompts has been voiced). To keep the track of maneuvers, Narrative has begin and end methods called at the start and end of navigation by QML Map.

To support slower TTS (like mimic's ap voice), I made the voices to be synthesized in advance. At present, 3 maneuvers ahead. This should also allow us to extend to online synthesis, if we wish.

To ensure that we don't miss the maneuver, I reduced the period of narration timer to one second. At present, the preferred voice gender (when possible, only female voices in picotts) is specified in Preferences. Maybe we can move it to the new Navigation page.

Overall preference on whether to enable voice commands is stored in config.py/voice_commands, but doesn't have GUI, as you suggested.

otsaloma commented 6 years ago

Done: https://github.com/otsaloma/poor-maps/pull/49 https://github.com/otsaloma/poor-maps/commit/77919936b10ddd8eaa2fe8caf90b91d85b1de5de

rinigus commented 6 years ago

Congratulations and thank you very much!

otsaloma / poor-maps

Add voice navigation #30

Background

Text to speech

Voice prompts