jsk-ros-pkg / jsk_3rdparty

42 stars 60 forks source link

[ros_speech_recognition] Set confidence when using Google #434

Closed nakane11 closed 1 year ago

nakane11 commented 1 year ago

Currently confidence in SpeechRecognitionCandidates is set to 1.0. If show_all is true (default is false), recognize_google returns the raw API response as a JSON dictionary and we can get confidence value to compare results.

from https://cloud.google.com/speech-to-text/docs/speech-to-text-requests#confidence-values:

The confidence value is an estimate between 0.0 and 1.0. It's calculated by aggregating the "likelihood" values assigned to each word in the audio. A higher number indicates an estimated greater likelihood that the individual words were recognized correctly.

Example

$ rostopic echo /Tablet/voice 
transcript: ['\xe3\x83\x86\xe3\x82\xb9\xe3\x83\x88']
confidence: [0.7732098698616028]
---
transcript: ['\xe3\x83\x86\xe3\x82\xb9\xe3\x83\x88']
confidence: [0.8018247485160828]
---
knorth55 commented 1 year ago

Good! Can we also get the other candidates too?

nakane11 commented 1 year ago

Maybe Sphinx also returns confidence and Wit in word-level. I will try other engines when I have time.

knorth55 commented 1 year ago

Oh, I want to ask whether we can get multiple candidates from Google speech recognition engine. like ["hello", "hallo", "hollow"], ["0.8", "0.7", "0.6"] if so, we can use the other candidates in the future.

nakane11 commented 1 year ago

Sorry, I misunderstood.

I'm not sure if it is available in speech_recognition, but if maxAlternatives in request is greater than 1, result can contain one or more candidates (result['alternative'][1], result['alternative'][2], ...). "max_alternatives" https://cloud.google.com/python/docs/reference/speech/latest/google.cloud.speech_v1p1beta1.types.RecognitionConfig#:~:text=See%20%60Language%20Support-,max_alternatives,-int%0AMaximum%20number