Create customizable initiation phrases

GoogleCodeExporter commented 9 years ago

Currently the Glass API kicks off based on the phrase "okay glass...". It would 
be nice to have the ability to create custom phrases based on authenticated 
applications. So if my app needed something more natural in a conversation that 
would fit.

As an extra bonus it would be nice if this could be done programmatically and 
not just through something like a manifest file. That way if my users wanted to 
customize a phrase for my app they could.

Original issue reported on code.google.com by jonbcam...@gmail.com on 17 Apr 2013 at 1:11

GoogleCodeExporter commented 9 years ago

I do have very specific examples that I am developing but would discuss further 
if requested.

Original comment by jonbcam...@gmail.com on 17 Apr 2013 at 1:16

GoogleCodeExporter commented 9 years ago

Original comment by mimm...@google.com on 17 Apr 2013 at 1:45

Changed state: Accepted

GoogleCodeExporter commented 9 years ago

Issue 7 has been merged into this issue.

Original comment by mimm...@google.com on 17 Apr 2013 at 1:46

GoogleCodeExporter commented 9 years ago

One of the apps I've started building is a todo app and I would love it if my 
users could say "okay glass do the dishes tonight" and have my app be able to 
grab the text dictation to add it to their list.

Original comment by 4braham on 17 Apr 2013 at 2:33

GoogleCodeExporter commented 9 years ago

I still have reservations about widespread implementation of this. I really 
like the idea - but I can easily see Glassware registering tons of hooks which 
have no relevancy, or registering overlapping hooks. This is the equivalent of 
a home menu screen, and history has shown you don't want it to become too 
"cluttered", or whatever the equivalent may be.

Original comment by Prison4...@gmail.com on 17 Apr 2013 at 1:21

GoogleCodeExporter commented 9 years ago

I did think of that this morning also. maybe another option is registering
command groups. So you would say "okay glass" (like a verbal home button)
then call your group or app "... Dining". This would then add the commands
"calorie  count of..." Command to your top level commands. I'm totally
making up the command example and it may not be perfect but you should get
the idea.

Original comment by jonbcam...@gmail.com on 17 Apr 2013 at 1:46

GoogleCodeExporter commented 9 years ago

I can see the potential for abuse, though I don't think it would be a big issue 
if the user can manage which voice commands from their subscriptions will be 
active.  

In the event that there is still a conflict it could function similar to 
Android does when there are multiple apps that handle the same function and let 
the user choose (through a follow up voice command).  

Say for example I subscribed to two to-do apps and say "okay glass, do the 
dishes tonight..."  Glass would then show me which two apps can handle the 
command, and then I could either say "...with To-Do-App-Name", "...with Google 
Keep", or even "...with both."  Doing this in such a way as to complete the 
command a single sentence would flow nicely.

Original comment by b...@bytevoid.net on 17 Apr 2013 at 4:09

GoogleCodeExporter commented 9 years ago

I think that since the trigger is voice activated it will be easier to handle a 
larger number of keywords. E.g. third-party ones could be hidden until the user 
start speaking the correct trigger phrase. And as a user gets further into a 
trigger phrase the options that don't match could get hidden. I don't think it 
will be as bad as Android where when you select share you get a list of 20 apps 
that can handle text.

Original comment by 4braham on 17 Apr 2013 at 4:37

GoogleCodeExporter commented 9 years ago

How about "okay glass, activate *app_name*"? 

The service would send an "initiation" timeline card with available actions.

Original comment by kwu...@gmail.com on 18 Apr 2013 at 2:03

GoogleCodeExporter commented 9 years ago

Yup. That is what I am thinking would be nice if voice commands get out of
control.

Original comment by jonbcam...@gmail.com on 18 Apr 2013 at 2:13

GoogleCodeExporter commented 9 years ago

I'd like to see this too. Imagine I wanted to make Glass control my Sky TV STB, 
I might want to be able to say things like:

"Okay Glass; Sky Planner" - Open Planner (recorded shows)
"Okay Glass; Channel {blah}" - Switch to channel {blah}

The user should be shown all of these prefixes at install time, like other 
permissions. Whether they need to be customisable, I'm not sure (currently, you 
can't install an Android app but block on of its permissions, as much as I wish 
you could).

It'd also be good to grab phrases following, so you could do:

"Okay glass; Sky Planner"... wait for it to open "Play {show name}"

However, this would be more complicated to do in a way that ensures an app 
doesn't just soak up all audio without the user knowing. There would need to be 
something on-screen alerting the user that they were still interacting with the 
app. I'm sure you could figure something out.

Original comment by da...@tuppeny.com on 18 Apr 2013 at 2:37

GoogleCodeExporter commented 9 years ago

Original comment by ala...@google.com on 18 Apr 2013 at 3:20

Added labels: log-7975520

GoogleCodeExporter commented 9 years ago

The more I think about it, the more I think this would be unnecessary and very 
un-Glass-like. Particularly since we already have the ability to send messages 
to contacts and Glassware are valid contact destinations. The biggest change I 
might want would be to change the phrase "send a message" to "tell" (or 
possibly add it, along with other alises such as "remind"). So commands would 
sound like:

"ok glass, tell HAL to open the pod bay doors"
"ok glass, tell weps to fire the photon torpedos"
"ok glass, remind my task list to do the dishes"
"ok glass, send a message to my tv to play {show name}"

these are all more natural sounding than most of the alternatives, they don't 
require any configuration beyond what is already necessary, are entirely 
consistent with how Glass currently behaves, they seem to meet the needs of 
having "apps installed on the phone", and can be largely implemented today.

Original comment by Prison4...@gmail.com on 18 Apr 2013 at 4:05

GoogleCodeExporter commented 9 years ago

While I agree with you that the examples that have been discussed aren't
very 'glass-like' I have specific examples (that I can't share due to NDA)
that would be very strange to say "okay glass" in front of each request.

Original comment by jonbcam...@gmail.com on 18 Apr 2013 at 4:12

GoogleCodeExporter commented 9 years ago

Original comment by mimm...@google.com on 18 Apr 2013 at 9:08

Added labels: Type-Enhancement
Removed labels: Type-Defect

GoogleCodeExporter commented 9 years ago

Issue 19 has been merged into this issue.

Original comment by mimm...@google.com on 22 Apr 2013 at 4:19

GoogleCodeExporter commented 9 years ago

As Jenny noted on StackOverflow 
(http://stackoverflow.com/questions/16137974/how-do-i-send-a-message-directly-to
-my-app), glassware isn't a valid contact.

I still maintain making it a valid contact might be a better and more 
consistent solution.

Original comment by Prison4...@gmail.com on 22 Apr 2013 at 3:16

GoogleCodeExporter commented 9 years ago

Making it a contact would be a nasty hack. Given it's near impossible to take 
away from an API, it should be done properly. Your app is not a contact and the 
user shouldn't need to use strange phrases to interact with it as if it where. 
"OK glass, tell Sky to record Game of Thrones" vs "OK glass, record Game of 
Thrones". It should be natural.

Original comment by da...@tuppeny.com on 22 Apr 2013 at 4:25

GoogleCodeExporter commented 9 years ago

Danny, the problem with that is that everyone and their cat is going to want to 
"own" the initiation phrase "record." If Google wants devs to make lots of 
Glassware, then they should expect users to want to install and use lots of 
Glassware as well. Which means apps will need to differentiate themselves.

Original comment by andrewrabon on 22 Apr 2013 at 4:30

GoogleCodeExporter commented 9 years ago

I don't see how that's a problem; Android handles more than one app allowing 
into "Share". Users should be in full control. If I only want one app to 
respond to Record, I should be able to have that option. If I want multiple, I 
can make a selection. If I want a specific app, it can register a more specific 
phrase in addition "sky record".

Using contacts to represent apps makes no sense. What if an app wants to handle 
two types of commands?

There's also no requirement for apps to be fake contacts to have a similar API; 
but it certain my shouldn't be the only way; it would be terribly restricting 
and awkward.

Original comment by da...@tuppeny.com on 22 Apr 2013 at 4:43

GoogleCodeExporter commented 9 years ago

I think the idea of prefixes is important.  Prefixes of different apps may 
conflict, but I think users should be able to choose which one wins out.  We 
could allow the user to change an apps prefix if they have two that do conflict.

On android, multiple apps can handle the same type of resource.  If I hit a 
link it asks me if I want chrome or browser or lastpass, etc to handle the 
link, always or this time only.  Glass could work the same way on prefixes.

I still can't imagine the voice command doing much else than sending the 
command text to the glassware for handling. All google can really do is to the 
speech to text conversion, understand what app handles a prefix and then send 
the command and get the response from our glassware.

Original comment by mike.wes...@workiva.com on 22 Apr 2013 at 4:45

GoogleCodeExporter commented 9 years ago

Android "handles" this by bringing up an additional confirmation screen. 
Somehow I don't think you're proposing that the interaction be something like 
this:
"ok glass, record Game of Thrones"
[Card pops up with choice of apps]
[You swipe to select the app]

I can see you suggesting that this choice be made when they enable the contact, 
but this is a procedure that already seems more complex than it should be, and 
I can't imagine adding one more step to the process would make it easier.

If an app wants to handle two types of commands, then it would be sent the 
commands. This is part of the simplicity of my proposal - everything after 
"tell <target>" is sent to the app. I can have the app handle hundreds of 
commands if it made sense for the app:
"ok glass, tell the tv to record Game of Thrones"
"ok glass, tell the tv to delete this show"
"ok glass, tell the tv to skip the commercial"
"ok glass, tell the tv to go back 30 seconds"
"ok glass, tell the tv to start over"
"ok glass, tell the tv to pause"

Original comment by Prison4...@gmail.com on 22 Apr 2013 at 7:17

GoogleCodeExporter commented 9 years ago

As a twist on Comment #9, Google Now currently supports the method of invoking 
an app using "Launch Pandora", or "Open Netflix" when the app is actively 
listening for audible input. Would this not be possible for the main Glass 
screen as a means of navigating to a registered service? 

So announcing at the home screen, "OK, Glass, launch New York Times" would jump 
to that service's bundle of bundles? Obviously this wouldn't apply for things 
like Skitch that are share-only commands, but the concept is more for quick 
navigation.

Original comment by jasonsal...@gmail.com on 26 Apr 2013 at 12:33

GoogleCodeExporter commented 9 years ago

While an awesome feature that still doesn't help with the "okay glass..."
Part. That is what I am wanting to customize. Now I am not trying to just
change things from "okay glass" to something like "computer" (which would
be geeky cool) but more on an app by app level. I actually think that
something as common as "record" isn't necessary however there are certain
business reasons that a more custom initialization phase would be helpful.

Original comment by jonbcam...@gmail.com on 26 Apr 2013 at 12:50

GoogleCodeExporter commented 9 years ago

I think this needs splitting into two cases?

1. Ability to change the term "OK Glass"
2. Ability to register voice actions of some sort (that will still be prefixed 
with "OK Glass")

Original comment by da...@tuppeny.com on 26 Apr 2013 at 2:37

GoogleCodeExporter commented 9 years ago

I completely agree with splitting this into a new case that is #2 above.

Also, there could be a #3 which would be:

As Jenny noted on StackOverflow 
(http://stackoverflow.com/questions/16137974/how-do-i-send-a-message-directly-to
-my-app), glassware isn't a valid contact.

I still maintain making it a valid contact might be a better and more 
consistent solution.

Original comment by cecilia....@gmail.com on 26 Apr 2013 at 3:51

GoogleCodeExporter commented 9 years ago

Well I would also want
3. ability to change the term "okay glass" on an application by application
basis and maybe even programmatically.

Original comment by jonbcam...@gmail.com on 26 Apr 2013 at 4:32

GoogleCodeExporter commented 9 years ago

Ok, I'll go on record (again) to state the unpopular decisions:

Regarding changing "ok glass" - I strongly disagree that this should be 
tampered with in any way, shape, or form. I can understand there are use cases 
that might benefit from keying off a different phrase (tho I'm not sure I can 
think of any offhand). I can understand the desire to personalize your own 
Glass. I can understand wanting to change it as a security measure. But I think 
the potential for abuse of this is strong as well. One of the arguments for 
keeping "ok glass" is that people around us will be aware that we are doing 
something with Glass. This is a direct counter-argument to the notion that we 
can be doing something secretly. Being able to change this phrase to something 
like "hi there" would severely undermine that argument.

I am not convinced that other voice actions are necessary if Glassware can be 
set as a contact for voice messages. I could be convinced that other voice 
actions might be setup as aliases for "send a message" ("tell" and "ask" come 
to mind), but I don't think that should be on an app-by-app basis.

In the SO message that Cecilia mentioned, Jenny suggested to star this issue 
and discuss it here. I don't have a problem splitting the issues - but it 
sounds like the Glass team is considering it as one thing.

Original comment by Prison4...@gmail.com on 26 Apr 2013 at 5:34

GoogleCodeExporter commented 9 years ago

My  2 cents: Dont mess with 'ok glass', but allow adding apps a contacts, so 
that I can send a message to my app.

HOWEVER, allow the app to create an contact that is immediately active. The 
extra step of having the user go google.com/myglass to add the contact as 
another step, and then enable it as a sharing contact is just too much of a 
hurdle.

Original comment by arthur.v...@gmail.com on 27 Apr 2013 at 4:12

GoogleCodeExporter commented 9 years ago

Although slightly off-topic, I agree with not having the extra step of enabling 
a share target on Glass. It should be part of the oAuth process and by default 
it should appear enabled. I understand the need to disable it at a later time 
if the user wants to review the active share targets at any time.

Original comment by cecilia....@gmail.com on 27 Apr 2013 at 6:55

GoogleCodeExporter commented 9 years ago

How difficult might "strung commands" be with the Glass API?  In other words, 
if you want to give Glass a series of commands like so:

"OK Glass, command series: Take a photo THEN Record a video THEN Send an email"

In other words, rather than having to repeat the initial command "OK Glass", 
you save some time and strain on the user. The trick is whether or not Glass is 
smart enough to accept "THEN" or "FINALLY" etc type commands, plus smartly 
monitoring logical pauses or cadence in someone's speech.  

For example, I tend to stop talking/pause between thoughts.  That would be a 
real problem in a situation like this unless you enter a mode for the command 
such as "command series" (I am sure someone could come up with a more sexy 
phrase for this concept).

-Jesse

Original comment by jessehe...@gmail.com on 27 Apr 2013 at 9:38

GoogleCodeExporter commented 9 years ago

Just wanted to hop in.

Glass is Glass, and glassware exists on Glass. IMO the wake-up command for 
Glass ("Ok Glass") should not be editable, at least until the commonfolk are 
more familiar with Glass. 

That said, I can certainly see the functionality of letting apps either 

1) register one top-level command (Quora registering "ask Quora", so I can say 
something like, "Ok Glass, ask Quora 'What is the best sushi restaurant in 
Rolla, Missouri?'"), or 

2) be opened with a generic open command, so you can say something like, "Ok 
Glass, open Baby Monitor".

Allowing some flexibility for glassware to be opened in ways other than sharing 
content into them would open up the kinds of apps available on Glass, while 
limiting the way they can be opened prevents accidental opens and confusion in 
what exactly you need to say to interact with a new "app".

Original comment by chrono.j...@gmail.com on 3 May 2013 at 3:41

GoogleCodeExporter commented 9 years ago

Original comment by mimm...@google.com on 9 May 2013 at 9:34

Added labels: Component-Mirror-API

GoogleCodeExporter commented 9 years ago

My two cents: I view one of the main advantages of Glass is its ability to be 
used with no hand interaction.  E.g., I'm washing dishes so my hands are wet 
but I want to send a text message.  So the ability to add custom commands 
greatly extends the potential of Glass.

Original comment by DanLe...@gmail.com on 17 May 2013 at 3:59

GoogleCodeExporter commented 9 years ago

I also want to add custom commands for the apps I have developed. The use case 
that I have in mind right now is something like, "Okay glass, how much money is 
left in my checkings account?" or something like that.

I also would really love a Wolfram Alpha hook so I could say, "Okay Glass, 
Wolfram Alpha The air speed velocity of an african swallow".

I say allowing the users to change which voice hooks are displayed/activated is 
the correct way to do this as well.

Original comment by jes...@jessieamorris.com on 7 Jul 2013 at 10:26

GoogleCodeExporter commented 9 years ago

Issue 144 has been merged into this issue.

Original comment by mimm...@google.com on 22 Jul 2013 at 9:10

GoogleCodeExporter commented 9 years ago

We added two voice commands that you can use in your Glassware: 
https://developers.google.com/glass/contacts#declaring_voice_menu_commands

You can also request that new commands be added: 
https://services.google.com/fb/forms/glassvoicecommand/

Original comment by mimm...@google.com on 8 Oct 2013 at 1:07

Changed state: Fixed

GoogleCodeExporter commented 9 years ago

Really shouldn't be marked as fixed. There are examples directly above that 
can't be implemented with "take a note" and "post an update", like checking a 
bank account.

Really disappointing Google went this direction instead of letting app 
developers add any voice command label they want and letting users choose which 
apps they want to install. This is very limiting and not the way Android and 
Google Play became so successful. Really disappointing the APIs on Glass are 
intentionally crippled and it will never have the amazing ecosystem Google Play 
has.

Original comment by lna...@gmail.com on 8 Oct 2013 at 1:31

GoogleCodeExporter commented 9 years ago

@Inanek: That's what the linked form is for. If there are other voice commands 
you'd like Glass to support, please let us know: 
https://services.google.com/fb/forms/glassvoicecommand/

Original comment by mimm...@google.com on 8 Oct 2013 at 1:35

GoogleCodeExporter commented 9 years ago

I'm a PM on the Glass team for our voice experience, so I wanted to chime in a 
bit on our thinking here.

Right now, we do have a policy that we need to explicitly approve all voice 
commands. Part of the reason is that we want to make sure that voice commands 
are consistent. To give an example, we want the command to describe what you're 
trying to accomplish "ok glass, get directions to Pizza Hut" instead of what 
software you want to use to accomplish it "ok glass, open Google Maps".

But the bigger reason is that for every voice command we add, we build a hand 
tuned acoustic model for recognizing that command. We make sure that the voice 
command doesn't overlap too closely with our existing command, and tune those 
existing models if necessary (this avoids false positives). We make sure that 
the voice command handles different accents. This hand tuning is why the voice 
recognition on glass is so high quality and fast. We very much want to keep it 
that way.

As Jenny said, if you've got a voice command that you want us to support then 
please submit it to our form: 
https://services.google.com/fb/forms/glassvoicecommand/

Lately I've been replying to all voice requests within a few days ... so 
hopefully we can find the right command for you and kickoff the process of 
getting a voice model built soon.

Original comment by jeffhar...@google.com on 8 Oct 2013 at 3:02

GoogleCodeExporter commented 9 years ago

Thanks Jeff for your clear explanation.

One question for you, in the case of glassware that would accept multiple 
commands, I know you do discourage the use of free recognition text to indicate 
commands, but to what extent does that suggestion go? Like in the case of 
Genie, there are multiple commands such as Add to shopping list, add to to do 
list, add to log, etc. 

One possibility would be that through the take a note voice command, all I'm 
taking are notes, but there are different types of notes that the glassware 
could pick up in the case specific words are mentioned.

The other possibility would be to have a ton of different voice commands and 
that would make the main list of voice commands very crowded.

How far do we go in the adding new voice commands line versus adding some 
context on the one command itself?

Original comment by cecilia....@gmail.com on 8 Oct 2013 at 3:23

GoogleCodeExporter commented 9 years ago

@cecilia There is the possibility for multiple share contacts that could work 
with one voice command phrase.

Original comment by ghchinoy on 8 Oct 2013 at 3:25

GoogleCodeExporter commented 9 years ago

Concur with Cecilia's question, and the general tone of the inquiry.

I hope that the "top level" voice commands are as generic and broad as possible 
- giving a similar launching point to multiple Glassware services, but allowing 
the Glassware to provide multiple contacts that can handle further details. I 
would NOT want the top menu to be littered with overly specific voice command 
launches.

Original comment by Prison4...@gmail.com on 8 Oct 2013 at 3:27

GoogleCodeExporter commented 9 years ago

That's a great question and honestly not something that I think we know the 
long term answer to.  We don't currently have the ability to add context to our 
voice commands (something like an "ok glass add to..." command that can be 
followed by a fixed list of possible things <shopping list, to do list, log, 
etc...>. I don't want to commit to anything, but it's definitely something 
we've debated.

I think for now, we'll push y'all to create different commands for different 
actions. The idea being that when you see the command on your screen you should 
have a very clear sense of what it will do "ok glass, add to my shopping list" 
is really clear what it will do whereas "ok glass, add to..." is not. 

We're also worried about the possibility of cluttering up the voice menu with 
too many commands. We may need to address that by intelligently ranking the 
commands in the menu, or by letting users choose which commands to expose from 
their installed application, or by doing what you suggest and having specific 
possible phrases that must follow commands. But we're going to cross that 
bridge as it comes.

Original comment by jeffhar...@google.com on 8 Oct 2013 at 3:37

GoogleCodeExporter commented 9 years ago

For contacts, you have to manually add them via the Glass app before they show 
up on your "shortlist". Why can't voice commands work the same way? If they 
user uses the todo app a bunch, they can add it to their top level commands, or 
if that is a algorithmic problem then just subset all user commands with the 
top level command "with". So therefore: 

("ok glass", "take a picture")
("ok glass", "with twitfacegram", "take a picture")

Looks something like this:
("ok glass", "with <app-name>", "<app argument>")

I think this solution would cover most use cases. Please correct me if I'm 
wrong.

Original comment by super3...@gmail.com on 8 Oct 2013 at 4:43

GoogleCodeExporter commented 9 years ago

super3 I think you have the perfect solution, 'ok glass with X do X'  Google, 
Please make this happen ASAP!

Original comment by marty...@gmail.com on 29 Oct 2013 at 2:42

GoogleCodeExporter commented 9 years ago

I would love to Change Ok Glass to Ok Jarvis.

Original comment by ice2...@gmail.com on 18 Nov 2013 at 5:41

GoogleCodeExporter commented 9 years ago

Please note, the form for suggesting a voice command now resides here:

https://developers.google.com/glass/distribute/voice-form

Original comment by timothyj...@google.com on 14 Feb 2014 at 10:13

GoogleCodeExporter commented 9 years ago

Not to be overly negative here, but I dont know how the restriction on custom 
voice triggers does anything BUT make the entire experience worse. I mean I 
guess I'll just launch my immersion app and try to keep users there because I 
cant have them multitasking and get back to my app with any sort of reasonable 
actions.

Also, the request for custom triggers states generic enough for other apps to 
be able to respond to the same command etc... and then some of the defaults are 
"learn a song" and "start a round of golf". Wtf?

I totally echo the suggestions above where even if you have a context of 
someapp and do "ok someapp start some action", that would be way more logical / 
common sense.

I guess for now, I'll just use a hugely inaccurate existing command like "check 
me in" because even that has *NOTHING* to do with our software, I guess its the 
least wrong

Original comment by wwe...@gmail.com on 21 Apr 2014 at 7:57

wonjsohn / google-glass-api

Create customizable initiation phrases #6