robotology / assistive-rehab

Assistive and Rehabilitative Robotics
https://robotology.github.io/assistive-rehab/doc/mkdocs/site
BSD 3-Clause "New" or "Revised" License
20 stars 11 forks source link

Explore new options for triggering speech #250

Closed vvasco closed 1 year ago

vvasco commented 4 years ago

The option we explored for triggering speech is the hand up/down detection through vision. We tried using both 2D and 3D skeleton. Even if the 3D approach is more robust (#246), vision alone has important limitations. First, it assumes the person to always be in the FOV, which cannot be guaranteed in a HRI scenario. Moreover, for a TUG scenario, when the person starts walking, the robot focuses on lower limbs to extract motion metrics. In order to observe the whole skeleton, given the narrow FOV of the camera, the distance robot - person should be considerably high.

I'll describe here a list of options that came into our mind and we might want to explore.

1. Combining vision and sound

This option might overcome the limitations of vision and sound used alone. If we use a predefined vocal command alone to trigger the speech (for example "Hey R1"), the system can either be always listening, being very sensitive to noise, or listening for a predefined amount of time, which is unrelated to the length of the sentence, and might thus result too big or small. One solution could be using the vocal command to focus the robot's attention on the upper limbs and the hand up/down detection to provide the start/stop. Another solution could be using the vocal command to provide the start and the hand down detection the stop.

Limitations:

2. Equipping the user with a button

The user has to keep pressed a button while asking questions. A possible device could be this, which can be directly connected to the wifi. A REST API is available. Pressing the button could also trigger additional features, such as person following.

Limitations: the user should be equipped with a microphone Advantages: robustness

3. Developing an app for the phone

The user interacts with the robot through the phone for asking questions. This would allow us to both manage the start/stop and remove the external microphone. We could use yarp.js.

Limitations: it could be difficult to adopt for patients walking with aid supports. Advantages: we could remove the external microphone

We could develop both options 2 and 3 and use one or the other according to the specific case.

cc @pattacini @vtikha

pattacini commented 4 years ago

In #280 @vvasco implemented option 2 (aka using a button). Let's keep this issue open though as relying on a phone app seems still intriguing 😉

The app could become a sort of a project to assign to someone else.