Maybe it's interesting in the future: Google Speech Recognition and Synthesis APIs

roboticslab-uc3m / speech

Text To Speech (TTS) and Automatic Speech Recognition (ASR).

https://robots.uc3m.es/speech/

GNU Lesser General Public License v2.1

4 stars 4 forks source link

Maybe it's interesting in the future: Google Speech Recognition and Synthesis APIs #28

Closed rsantos88 closed 2 years ago

rsantos88 commented 5 years ago

It can be interesting as an on-line procedure to speech recognition and speak using Google Speech Recognition and Synthesis APIs (improving a better recognition without having to configure a dictionary and pronunciation in English more clearly): https://github.com/robotology/yarp.js/tree/master/examples/speech_recognition

PeterBowman commented 5 years ago

This application uses the Google Speech Recognition and Synthesis APIs available for Google Chrome (not Firefox yet!) to:

I recall that in-browser voice recognition was causing trouble in asibot-main (currently webInterface in asibot-hmi) due to Chrome having dropped support for that. Last commit was three years ago, so I wonder whether this is still an issue.

jgvictores commented 5 years ago

We've had issues depending on a reliable Internet connection during conferences (e.g. ASIBOT and TEO) and contests (yes, for TIAGo, see private issue) it's officially legal). Therefore, IHMO, I'd prefer to focus on offline solutions.

However, I've opened a Similar and Related Projects with this information at https://github.com/roboticslab-uc3m/speech/commit/87fcab140e62b1448e7e9151df578dfb6e4b13bd (as we already had at kinematics-dynamics and other repos). ^^

dpriver commented 5 years ago

To contribute with my 2 cents.

If you are interested, the Kaldi Speech Recognition Toolkit implements the current state-of-the-art architectures for Speech Recognition. I am actually working with that and the WER on the librispeech Corpus is around 6 (I have not tried any other Corpus). Another good think is that you will have complete control over the system and the models, and that it is executed offline, so you don't have to care about the network connection.

Executing on neither TIAGo nor TEO will guarantee real time performance, but the response time will probably be good enough for most cases.

The biggest drawback is that it requires a high level of expertise to build a system based on Kaldi, and probably a lot of hours of development, so you would need someone working exclusively on that for some time... I think there are ready-to-use kaldi based implementations out there. Maybe is worth looking for something like that.

jgvictores commented 5 years ago

If you are interested, the Kaldi Speech Recognition Toolkit implements the current state-of-the-art architectures for Speech Recognition. I am actually working with that and the WER on the librispeech Corpus is around 6 (I have not tried any other Corpus). Another good think is that you will have complete control over the system and the models, and that it is executed offline, so you don't have to care about the network connection.

Ref added at https://github.com/roboticslab-uc3m/speech/commit/760b1c1adb8ab306d2cb83f92ff2eff4fba36947, thanks!

PeterBowman commented 2 years ago

Not doing per "if it works, don't touch it" and https://github.com/roboticslab-uc3m/speech/issues/8#issuecomment-968270204.

PeterBowman commented 1 year ago

Actually doing it per "yeah, we have an ASR, but it's so bad it hurts" through Vosk (which uses Kaldi under the hood): 2dd945939c7142d374912b4a5a5dee130724872c. However, this solution is entirely offline and performs quite nicely with small (~50 MB) models.