mozilla-mobile / fenix

⚠️ Fenix (Firefox for Android) moved to a new repository. It is now developed and maintained as part of: https://github.com/mozilla-mobile/firefox-android
https://github.com/mozilla-mobile/firefox-android
Mozilla Public License 2.0
6.49k stars 1.28k forks source link

Voice input for Search #1216

Closed vesta0 closed 4 years ago

vesta0 commented 5 years ago

Feature Goal

User Story

As a user, I want to be able to do a quick voice search, so I can search faster.

Acceptance Criteria

Note: This is already possible today, user can tap on the Android microphone icon on their keyboard and voice search, but since this request comes up a lot by users, we can only assume that it's possibly hidden on some phones and/or users are used to seeing it in the search bar.

Fenix: ScreenshotUNITO-UNDERSCORE!20200507-171812!

Fennec: ScreenshotUNITO-UNDERSCORE!20200507-171805~2!

┆Issue is synchronized with this Jira Task

lime124 commented 5 years ago

Hey @vesta0 What is the specific question to UX? as I understand it, we're going to be using all native stuff so there isn't anything for us to provide.

vesta0 commented 5 years ago

Confirmed with the team that this already works with the native android voice search. We won't build anything extra for MVP.

FruityWelsh commented 4 years ago

Is there overlap between the voicefill project and this issue?

NotWoods commented 4 years ago

We could potentially use Mozilla SpeakToMe for voice search. This service is used for Voice Fill and WebThings Gateway. (I think it can be configured to use Common Voice as a dataset?)

SpeakToMe has a JS API that calls the server we can use as a template to figure out how to structure REST calls.

sblatz commented 4 years ago

I think this would not be a ton of engineering effort for us since we're already using this for the search widget :)

liuche commented 4 years ago

Some other concerns:

vesta0 commented 4 years ago

I think ideally we should only display the voice input option if the user's default search is Google.

liuche commented 4 years ago

We could potentially use Mozilla SpeakToMe for voice search. This service is used for Voice Fill and WebThings Gateway. (I think it can be configured to use Common Voice as a dataset?)

SpeakToMe has a JS API that calls the server we can use as a template to figure out how to structure REST calls.

Chatted w/ Abe from the Voice team. They're definitely interested in trying out some of their voice features (like the ones in Firefox Voice) in Fenix. For now, the backend for these voice-to-text libraries isn't yet using the Mozilla Common Voice DeepSpeech module, so we should hold off until that is integrated/ready.

brampitoyo commented 4 years ago

@vesta0 @liuche

As we think about voice search:

“open google dot com” “search for polar bears” “search amazon for microwave ovens” “wikipedia, who is Jacinda Ardern?”

It may be worth considering other super powers that Firefox Voice can give us, specifically around exposing ‘deep’ browser features:

“open settings” “delete my browsing history” “turn off tracking protection” “change my search engine to Bing”

yoasif commented 4 years ago

This seems very weird - I just looked at Chrome, Edge, Brave, Opera and Samsung Internet, and none of them offer a voice search.

I also don't think it is clear that just because a user is using Google search, it doesn't necessarily mean that they want Google to listen to their voice.

Perhaps some mockups can clarify this a bit, but I don't see how putting a microphone icon in the main Fenix UI looks anything other than a person's voice being transcribed by Common Voice or similar, especially with the privacy branding around Firefox generally.

vesta0 commented 4 years ago

@yoasif the specific feature I am proposing here (adding a voice icon to the search bar - very minimal) do already exist in Chrome, Brave, Opera, Samsung, and some other browsers.

yoasif commented 4 years ago

I may be missing something, but I don't see it in Brave, Chrome, Samsung, Opera.

I do however see it in Edge.

signal-attachment-2020-05-11-161042

In any case, voice search that is not locally transcribed is a real security and privacy question because users' voices are being sent to external companies whose privacy policies are likely not coherent with the ones listed for Fenix or Mozilla.

I would have zero issue with local transcription of voice inputs, but sending my (or any users') voice to another company is very concerning to me.

Remember that Google's hardware SVP said that they would disclose smart speakers to guests entering his home:

"Does the owner of a home need to disclose to a guest? I would and do when someone enters into my home, and it's probably something that the products themselves should try to indicate."

Fenix should have the same level of respect for its users. Doing a search shouldn't imply that massive corporations are literally listening to your voice.

cadeyrn commented 4 years ago

I may be missing something, but I don't see it in Brave, Chrome, Samsung, Opera.

From these browsers I only have Chrome installed so I can't speak about the other browsers but there is a voice icon in Chrome.

yoasif commented 4 years ago

Not sure why, but I don't see one. I cleared app data to reset it to defaults and it is the latest release version provided by Play Store on my Pixel 2. What does it look like for you? Like the Edge screenshot above @cadeyrn ?

signal-attachment-2020-05-11-163816

cadeyrn commented 4 years ago

What does it look like for you? Like the Edge screenshot above @cadeyrn ?

Screenshots from Chrome:

chrome

yoasif commented 4 years ago

Strange. There must be a setting somewhere that Chrome and the others are respecting here. Or a bug. :smile:

brampitoyo commented 4 years ago

The solution I would propose is to put search on the area above the keyboard – which already contains “Scan” and “Shortcuts”, like so:

Screen Shot 2020-05-12 at 4 23 11 PM

When tapped, it goes to Google voice input (different Android phones may have their own TTS service) and awaits users to speak something.

After voice has been translated into strings, we go back again to the Fenix app. From this point, the browser takes the keyword/URL and performs a search or goes to the address, on the same tab where the user pressed the “Speak” button.

What do you think, @sblatz?

vesta0 commented 4 years ago

@brampitoyo what do you see as the downside of placing the voice icon in the search bar? I think it would be more visible/accessible there, and that is where users would expect to see it. Unless there is a specific reason we wouldn't want to add anything to the search bar.

@brampitoyo @betsymi we also need mocks/strings for disabling voice search as part of the Accessibility menu in settings.

yoasif commented 4 years ago

@cadeyrn I figured out why I wasn't seeing the mic input in Chrome and other apps -- I disabled the Google app on my device a long time ago -- re-enabling the app restores the mic input in various apps.

I agree with @vesta0 that it makes sense to put the mic icon where users expect to see it, even though it conflicts with my overall objection to using external TTS without a warning in Fenix.

That UI might be less cumbersome with https://github.com/mozilla-mobile/fenix/issues/7461#issuecomment-625039188 and the pill, but I kinda think that area feels a little overburdened to begin with, unfortunately. I think Fennec does the "this time search with" search better than Fenix (vs. the shortcuts) - and I think that is worth another re-think.

In any case, Fennec shows both the barcode scanner and the voice input in the search bar, and while the scan icon could stand to look a little prettier, I think it looks fine overall.

signal-attachment-2020-05-13-034430

I think that Fennec maintained a pretty clean separation between the address bar being an input (voice, text, barcode) and the bottom area being a modifier for input data, and the Fenix design kind of muddles that separation.

It would be nice if Fenix used the cue that Chrome, Brave and others have taken to disable Google voice input if the Google app is disabled in the user's device, notwithstanding any accessibility preferences - unless TTS via other vendors is supported, I would even be curious to see if it works -- Fennec doesn't display the mic icon if my Google app is disabled, FWIW.

vesta0 commented 4 years ago

@yoasif thanks for looking into this. FYI the first time user taps on the mic icon they will be asked to accept or deny voice recording permission. Would that address some of your concern?

@sblatz @brampitoyo let's take @yoasif 's suggestion in mind as we build this.

It would be nice if Fenix used the cue that Chrome, Brave and others have taken to disable Google voice input if the Google app is disabled in the user's device, notwithstanding any accessibility preferences - unless TTS via other vendors is supported, I would even be curious to see if it works -- Fennec doesn't display the mic icon if my Google app is disabled, FWIW.

@brampitoyo please share your final recommendation around the placement of the voice option.

yoasif commented 4 years ago

@vesta0 Well, the fact that at least on my device, the Google logo appears (like on @brampitoyo's mockup above) helps a lot. However, this doesn't change the fact that I have already given Fenix the ability to listen to me (again, totally fine with this) but the permission is effectively proxied to Google.

I'll have to do some more testing on this, but in Chrome, I don't even need to have a Google account in my Android accounts for the TTS to work. I have no idea where my voice just went (tested this) - whether it went to the cloud or not, and how it is going to be used to identify me in the future.

Chrome has an easier to understand feature here, because both Chrome and Google Assistant/TTS are both Google branded and it makes sense that when I give permission for Google Chrome to record my voice that Google TTS would also get permission -- it is kind of the reason why Google thought it was okay to login to Google Chrome (kinda) when logging into Gmail on Chrome.

Here is another question: if I navigate to Mozilla's Common Voice and allow Fenix to record my voice, do I then allow Fenix to proxy my voice to Google TTS?

It works right now in Chrome - try it:

  1. Navigate to https://voice.mozilla.org/en/speak
  2. Tap the record button
  3. Give mozilla (the site) permission to record
  4. Give Chrome the permission to record
  5. Tap the address bar
  6. Tap the Mic

What happens:

Google TTS works!

I guess that is expected because of course you gave Google the permission to record your voice when you gave Mozilla the permission to do so.

I'm not saying that this is an easy problem to solve from a UX perspective, it is just a vital one considering the privacy implications and the branding around Firefox generally around privacy and tracking.

Of course, not making Google TTS available if the Google App is unavailable might make the problem a simpler one to deal with - I just don't know if it is a cop-out. It would solve the problem for me, but I would still be looking forward to the feature via Mozilla SpeakToMe.

It is worth noting that Edge continues to show me their mic icon even with my Google App disabled, likely because they ship their own TTS; I don't see a Google logo when using it, and it has a whole different UX. Fenix should do the same.

brampitoyo commented 4 years ago

@vesta0 wrote:

what do you see as the downside of placing the voice icon in the search bar? I think it would be more visible/accessible there, and that is where users would expect to see it. Unless there is a specific reason we wouldn't want to add anything to the search bar.

I believe that our initial concern was related to thumb-reachability: putting search-related actions close to the keyboard where your fingers are already typing).

Of course, there’s the “x” clear text button up there on the search bar. But you can accomplish the same task (just not as quickly) by holding the “Backspace” key.

@yoasif wrote:

I think that Fennec maintained a pretty clean separation between the address bar being an input (voice, text, barcode) and the bottom area being a modifier for input data, and the Fenix design kind of muddles that separation.

I’d love to know if @Verdi, who designed our search UI, would have any opinion around this topic.

Fenix lets you pick a search engine right from the get-go, before you search for anything, and without having to tap the (let’s say) Google logo. This has the benefit of fitting with the user’s thinking pattern: “I want to search Amazon for hydro flasks”, “I want to read the Wikipedia page for African cuisine”, etc.

I think that optimising for “search before you type” and thumb-reachability might cause some limitations that make voice and QR don’t appear up top.

It is worth noting that Edge continues to show me their mic icon even with my Google App disabled, likely because they ship their own TTS; I don't see a Google logo when using it, and it has a whole different UX. Fenix should do the same.

I’d be curious to hear what your proposal might be.

My observations are as follows:

A way out that I see might be to have a settings menu item to turn off the voice search feature permanently. Would that solve your problem?

yoasif commented 4 years ago

@yoasif wrote:

I think that Fennec maintained a pretty clean separation between the address bar being an input (voice, text, barcode) and the bottom area being a modifier for input data, and the Fenix design kind of muddles that separation.

I’d love to know if @Verdi, who designed our search UI, would have any opinion around this topic.

Fenix lets you pick a search engine right from the get-go, before you search for anything, and without having to tap the (let’s say) Google logo. This has the benefit of fitting with the user’s thinking pattern: “I want to search Amazon for hydro flasks”, “I want to read the Wikipedia page for African cuisine”, etc.

Thank you for asking me to defend my assertion @brampitoyo - having taken a closer look at Fennec, I can see that Fenix indeed helps optimize for this use case and mental model. However, I think it is clear that the Fenix way makes it harder to make the decision afterwards - perhaps I am backwards in some ways, but I always know what I want to search for, but I often make the decision of where I want to conduct the search the split second before I enter the query.

On desktop for example, what I often do is -- Control-k (I have the separate search bar enabled), type a query, then tab, tab to DuckDuckGo or to Amazon, then press enter to perform the search.

I don't want to get too deep into the weeds on whether users generally think about where they want to search prior to entering their search query, but I think it is indisputable that searches are pretty worthless without a query, and that nearly all of the time a user has a specific query in mind, even if they don't have a search engine in mind.

All to say that Fenix currently makes it harder to change the search engine after entering the query, whereas in Fennec, the engine selection was in your face as you typed your query.

I think that here, I would make the following changes to Fenix:

I think that optimising for “search before you type” and thumb-reachability might cause some limitations that make voice and QR don’t appear up top.

I think that we saw what Fennec did poorly and learned the wrong lesson - that because it was hard to select a search engine prior to doing a search, that we would go the whole way and optimize for that use case.

It is worth noting that Edge continues to show me their mic icon even with my Google App disabled, likely because they ship their own TTS; I don't see a Google logo when using it, and it has a whole different UX. Fenix should do the same.

I’d be curious to hear what your proposal might be.

I think it says great things about the Fennec team to observe that there is actually a good solution present there - I hadn't seen this before, but there is a toast that appears when the mic appears that says: "Your audio will be sent to Google to provide speech recognition service. A transcript will be shared with this app."

signal-attachment-2020-05-14-031114

Obviously the strings can be updated for Fenix, but it makes it clear that the audio is sent to Google, not simply transcribed by an on-device Google mic (I'd have few to no issues if this were the case). It also protects the user in the instance that the Google mic removes its branding to just be a mic icon instead, obscuring the fact that audio is sent to Google.

My observations are as follows:

* Google TTS is installed and activated in most Android phones

* Most users want voice search to “just work” without worrying about the provider, or tapping too many permissions dialog

* Mozilla doesn’t have a TTS service (if we have one, we can just support ours – this would solve the problem instantly)

I think you are right that most users want it to just work and to not worry about the provider, but the status quo today is that Google is heavily branding the experience, but it is not clear that the data is not processed on device (hey, I have a Google phone, it isn't that unlikely, and in fact, there is some offline functionality in Google Maps, for example) and it isn't "on brand" for Fenix to proxy my voice to Google's servers.

The fact that Google builds the OS and offers many services makes this very confusing, and we should also recall that people were surprised that people were listening to their conversations with Siri.

Of course my ultimate preference is a Mozilla service, even more preferably optionally offline (could be an in-app download) to allow for basic voice search.

If there is indeed an offline Google voice search that processes voices on device, there is no need for this warning, and if Fenix could opt into that by default and allow users to send their voices to Google optionally, there would also be no need for a warning.

Upon further digging, it looks like Pixels have this for en-us: https://techcrunch.com/2019/03/12/googles-new-voice-recognition-system-works-instantly-and-offline-if-you-have-a-pixel/

I understand the desire to include this in the product and have even agreed with Product on the placement of the functionality, but I continue to think that it would be a mistake to not educate users about what is happening to their voice. Informed consent matters, I think.

The reason that opting into a non-offline voice search being opted into wouldn't require a toast/warning is because the user is clearly informed and has consented.

A way out that I see might be to have a settings menu item to turn off the voice search feature permanently. Would that solve your problem?

I think that parity with Fennec would work best here. Fennec has both the toast and the accessibility toggle to disable the mic icon (the scan icon too). I didn't realize that Fennec had the former feature, but it makes sense when I see it.

Some special behavior for Pixels in en-us to not show a toast and defaulting to not sending to the cloud would be fantastic.

yoasif commented 4 years ago

Oh, and I forgot to mention - having a toast would also allow Fenix to offer voice search even when Google isn't the active search engine.

sblatz commented 4 years ago

I already have a finalized solution before I saw this new "button" styling, @brampitoyo. I'm wondering if we should just put it in the search box as a first attempt and see how people like that?

For what it's worth, nearly every other major service has the voice search icon in the top right corner, so I'd re-iterate what Vesta said about that's where people would expect it to be. I'm not sure we should break an Android paradigm here unless we have a really strong reason to.

For example:

Android Home Screen

image

Chrome

image

Messages:

image
vesta0 commented 4 years ago

@sblatz I recommend moving forward with your implementation. Having the mic in the address bar is the dominant UX on Android and there isn't a compelling reason to differentiate here.

@brampitoyo we can for sure discuss future iterations of this later on.

brampitoyo commented 4 years ago

@vesta0 For sure. Our first iteration as @sblatz has it looks good to me!

sblatz commented 4 years ago

This is enabled behind a nightly/debug feature flag so it can bake for a week or two. QA please verify it's working as expected 😄

yoasif commented 4 years ago

@sblatz @vesta0 @brampitoyo Trying out the voice feature in the latest build and it looks great. The only thing missing from what we discussed is the toggle in the accessibility settings.

However, the Fennec-alike toast is present, which clarifies the privacy situation immensely.

The only nit that I see is that the mic appears even after I have entered text via typing; this is bad because voice input doesn't add to the address bar, it replaces the input. It would be best to emulate other browsers and to only show the mic when the address bar is free of text.

Should I file this separately?

yoasif commented 4 years ago

The only nit that I see is that the mic appears even after I have entered text via typing; this is bad because voice input doesn't add to the address bar, it replaces the input. It would be best to emulate other browsers and to only show the mic when the address bar is free of text.

Should I file this separately?

I filed this: https://github.com/mozilla-mobile/fenix/issues/10829

sblatz commented 4 years ago

The only thing missing from what we discussed is the toggle in the accessibility settings.

I added it to the search settings screen. @brampitoyo is this an okay place for this preference to live? I figured since it's directly related to search it makes sense here, but please chime in with your thoughts!

image
yoasif commented 4 years ago

@sblatz I think it is fine either way - I just wanted it available. Thanks for clarifying!

AndiAJ commented 4 years ago

Hi, I've just checked this matter on the latest Nightly Build 200522 from 5/22 using the following devices: • Google Pixel 3a (Android 10) • Huawei Mate 20 Lite (Android 9) • OnePlus A3 (Android 6.0.1

✔️ I can initiate a voice search from the search bar and widget ✔️ I can turn this option off when I go to the Accessibility menu in settings ❓ The first time user taps on the mic icon, they should see a permission prompt - You have to disable the microphone permission for Google and only afterwards the prompt gets displayed

► Video 20200522-115354

@sblatz - Not sure about the prompt behavior, could you please review and advise? Everything else works properly, great job! ☺️

I'll remove the QA needed label until further notice.

sblatz commented 4 years ago

I believe the prompt is working as expected here. It should have the same behavior as our search widget in this regard, so I will close this :)