Closed vvasco closed 5 years ago
:thought_balloon: :speech_balloon:
The best option then for this is to use the architecture developed for the IBM JL. Do you confirm that we will be using English? If yes, then:
[ ] Use IBM speech to text - Backup can be google speech to text
[ ] Use IBM services
:cloud: to branch out all the possible responses
[ ] Speech can be trigger by hand raise
:raising_hand: or even hey R1
:speech_balloon:
[ ] Decide what mic to use, handheld
or the imbedded ones
on R1 :microphone:
Unfortunately, the first language we've got to use is Italian 🤔
we therefore need to use google services for speech to text.
Ok so we would have:
I completely agree @vvasco , but the mics are terrible on R1. In any case, the mics are plug and play therefore I can switch and test both easily.
We decided to use Google services for interpreting potential questions. As we don't plan to implement a conversation for now, but rather a simple natural question and answer mechanism, we can avoid IBM services and eventually resort to them later to include a more complex interaction.
The speech interpreter will manage questions (triggered by "hey robot" or hand raise), interpret them and send a corresponding keyword to managerTUG
(port name: /managerTUG/speech-interp:i
), which will formulate the answer accordingly and send it to iSpeak
.
For now, managerTUG
manages the following keywords:
speed
: "At which speed should I move?"aid
: "Can I use a walking stick?"repetition
: "How many repetitions should I do?"feedback
: "How good am I in doing this?"unclear
: the question is not understood;unknown
: the question is not in the list of possible questions and the answer is unknown.Important: we have to think of how to manage that the question generates an event, which can occur at any time and which the modules involved must be aware of.
cc @vtikha @pattacini
The following PR deals with this by adding:
managetTUG
Done in #225.
Examples of potential conversation can be found here:
Given these considerations, we have to understand which tool to use and start to create a potential conversation for a meaningful verbal interaction.
cc @vtikha