hello-robot / stretch_web_teleop

Remote web teleoperation for the Stretch mobile manipulators from Hello Robot Inc.
Other
13 stars 0 forks source link

Operator-to-robot Text-to-Speech #64

Closed hello-amal closed 1 month ago

hello-amal commented 2 months ago

Description

This PR adds operator to robot text-to-speech capabilities. Specifically, it adds:

  1. Backend:
    1. A TextToSpeechEngine abstract class that can be used to support multiple engines in a plug-and-play fashion. gTTS and pyttsx3 are implemented.
      1. This abstract class allows multiple voices, two speeds (slow and default), and interrupting an ongoing utterance.
    2. A ROS2 node (and corresponding custom message) that takes in text and additional metadata (voice, speed, whether to interrupt) from a topic and executes it using the specified engine (currently gTTS).
  2. Frontend:
    1. A new basic component, DropdownInput, that behaves like Dropdown but has a textarea to the left of the dropdown arrow.
    2. A web app component, on the same level as "Movement Recorder," that allows users to type arbitrary text, save/delete it, play it on the robot, and stop a robot's utterance.
    3. The data flows through WebRTC and ROSLibJS to enable the above to work.

Select design decision

Testing procedure

Before opening a pull request

From the top-level of this repository, run:

To merge

hello-amal commented 1 month ago

Ran all tests on 3030, but the test to ensure the requirements are complete. @hello-vinitha can you run that on your robot, since it is a "clean" install (e.g., it shouldn't have any of these audio libraries?)

hello-vinitha commented 1 month ago

@hello-amal All the tests pass on 2051. A couple of questions/suggestions:

hello-amal commented 1 month ago
  1. Well, gTTS uses Google's unofficial Google Translate API, which they may stop supporting at any time. So I think it is important to have pyttsx3, even if its voices are not good. I'll add a launchfile flag for that.
  2. Going back to our earlier discussion, changing "Play" to "Add to Queue" and only showing "Stop" when an utterance is playing would require the text to speech node to provide feedback back to the app, which requires changing it to an action and is a pretty involved change on both the web app and ROS node side. I have created an issue for this to be done as a separate PR (#73 ).
  3. Will make the color change.
hello-vinitha commented 1 month ago

Ah yes, I completely forgot that we had discussed (2). That sounds good, we can revisit that.

hello-amal commented 1 month ago

Addressed the changes. Here is a screenshot of the updated color scheme.

Screenshot 2024-07-15 at 1 50 15 PM