mozilla-extensions / firefox-voice

Firefox Voice is an experiment in a voice-controlled web user agent
Mozilla Public License 2.0
286 stars 122 forks source link

Allow individual wakewords to be aliased to a command #1118

Open ianb opened 4 years ago

ianb commented 4 years ago

We don't have a very expressive corpus of wakewords available, but there are a bunch of them, and they can all be detected at once.

It would be interesting, mostly for experimentation, if the user could configure a wakeword to not just start the recorder, but to execute a command directly. So "terminator" could mean scroll down, or something like that.

awallin commented 4 years ago

Specific wake words that might be valuable to execute a command include:

• Next (slide) • Close (Lightbox) • Accept • Play • Pause • Mute

ianb commented 4 years ago

Yes, if we can figure out our relationship with Picovoice I'm hoping we can establish a good vocabulary. For now it'll be a weird vocabulary, but still something we can fiddle around with.

GangaChatrvedi commented 4 years ago

I think one word is too short for wake word to behave. especially if that one word is commonly occurring word like next or close.

It will be hard to differentiate between when i just said next in my communication with someone and when i want pico to listen to me.

my suggested solution is we can trigger firefox-voice to be in long term listening mode by speaking something like

  1. hey pico. listen to me
  2. we can push multiple intent here.
  3. now we can close by you can rest now
ianb commented 4 years ago

One recommendation I've read is that a keyword/wakeword should have at least 6 phonemes. "Next" and "close" have, I think, three and two phonemes respectively (but I don't actually know how to count phonemes, so I'm guessing a bit).

Right now we don't have any custom wakewords and aren't setup to create or add them, so that part of the design will have to wait a bit. But if we believe someday that we will have that option, then we can experiment with the limited set of words we have right now, even if the phrasing is awkward.