Closed rsantos88 closed 2 years ago
This application uses the Google Speech Recognition and Synthesis APIs available for Google Chrome (not Firefox yet!) to:
I recall that in-browser voice recognition was causing trouble in asibot-main (currently webInterface in asibot-hmi) due to Chrome having dropped support for that. Last commit was three years ago, so I wonder whether this is still an issue.
We've had issues depending on a reliable Internet connection during conferences (e.g. ASIBOT and TEO) and contests (yes, for TIAGo, see private issue) it's officially legal). Therefore, IHMO, I'd prefer to focus on offline solutions.
However, I've opened a Similar and Related Projects
with this information at https://github.com/roboticslab-uc3m/speech/commit/87fcab140e62b1448e7e9151df578dfb6e4b13bd (as we already had at kinematics-dynamics and other repos). ^^
To contribute with my 2 cents.
If you are interested, the Kaldi Speech Recognition Toolkit implements the current state-of-the-art architectures for Speech Recognition. I am actually working with that and the WER on the librispeech Corpus is around 6 (I have not tried any other Corpus). Another good think is that you will have complete control over the system and the models, and that it is executed offline, so you don't have to care about the network connection.
Executing on neither TIAGo nor TEO will guarantee real time performance, but the response time will probably be good enough for most cases.
The biggest drawback is that it requires a high level of expertise to build a system based on Kaldi, and probably a lot of hours of development, so you would need someone working exclusively on that for some time... I think there are ready-to-use kaldi based implementations out there. Maybe is worth looking for something like that.
If you are interested, the Kaldi Speech Recognition Toolkit implements the current state-of-the-art architectures for Speech Recognition. I am actually working with that and the WER on the librispeech Corpus is around 6 (I have not tried any other Corpus). Another good think is that you will have complete control over the system and the models, and that it is executed offline, so you don't have to care about the network connection.
Ref added at https://github.com/roboticslab-uc3m/speech/commit/760b1c1adb8ab306d2cb83f92ff2eff4fba36947, thanks!
Not doing per "if it works, don't touch it" and https://github.com/roboticslab-uc3m/speech/issues/8#issuecomment-968270204.
Actually doing it per "yeah, we have an ASR, but it's so bad it hurts" through Vosk (which uses Kaldi under the hood): 2dd945939c7142d374912b4a5a5dee130724872c. However, this solution is entirely offline and performs quite nicely with small (~50 MB) models.
It can be interesting as an on-line procedure to speech recognition and speak using Google Speech Recognition and Synthesis APIs (improving a better recognition without having to configure a dictionary and pronunciation in English more clearly): https://github.com/robotology/yarp.js/tree/master/examples/speech_recognition