whatwg / html

HTML Standard
https://html.spec.whatwg.org/multipage/
Other
8.11k stars 2.67k forks source link

Make "triggered by user activation" match browser behavior #1903

Closed jyasskin closed 4 years ago

jyasskin commented 8 years ago

"Triggered by user activation" currently behaves differently from the way browsers handle user gestures. For example, Chrome only propagates a gesture through a single setTimeout, and doesn't propagate through XHR or Promise.then() at all.

We're discussing this in https://crbug.com/404161.

domenic commented 8 years ago

I guess this a more accurate description of the problem statement than https://github.com/whatwg/html/issues/1358, so let me close that one in favor of this.

RByers commented 8 years ago

Note that some work is happening in blink to try to make the "user gesture" notion tied to a Document so that, for example, sensitive operations like navigation can be performed only after the user has intereacted with that frame specifically. It's probably too early to know for sure that this prove useful, but at least Domenic and Nate should probably chat to make sure we're not heading in different directions here.

annevk commented 8 years ago

How does setTimeout propagation happen? You just annotate the task queued with whatever the current task is annotated with? And I guess it has to be some kind of special value that is checked first so you don't continue to propagate it.

jyasskin commented 8 years ago

@annevk Pretty much: https://cs.chromium.org/chromium/src/third_party/WebKit/Source/core/frame/DOMTimer.cpp?q=DOMTimer::DOMTimer&sq=package:chromium&l=82&dr=CSs

upsuper commented 7 years ago

What about key events? It seems to me Chrome and WebKit accept at least keypress event as a user gesture, but Firefox doesn't. Is there any concern about allowing that from the spec side, or it is just the spec being a bit behind?

annevk commented 7 years ago

@upsuper I think that would be possible. Just keypress or also keydown/keyup? (I guess it's the task that triggers them, but nobody has defined those in terms of tasks yet...)

upsuper commented 7 years ago

@annevk Also keydown and keyup.

domenic commented 7 years ago

https://jsbin.com/jawuqeq/1/edit?html,console,output is evidence that Chrome's heuristics, of propagating the user gesture, are perhaps not necessary for web compat. To see this note how clicking the "Run with JS" button causes the popup to be shown in Chrome, but not in any other browser. Presumably this is because Chrome is propagating the click of that button through to the popup-creation, whereas other browser are not.

domenic commented 7 years ago

There is also discussion of this in https://github.com/WICG/interventions/issues/12 and some brief discussion with @bzbarsky in https://github.com/whatwg/html/pull/2292.

At this point it would be extremely helpful if we could get info from other browsers on how their algorithms work. @jyasskin has outlined a proposal based on Blink's heuristics at https://docs.google.com/document/d/1csmMqHr60zded2tdjczr6HrL18iOPR1PlnM2FqCDLOQ/edit#heading=h.j0mdilrjre5y but I'm hesitant to try to codify that in the spec without any knowledge of how close or far away it is from other engines.

bzbarsky commented 7 years ago

// cc @smaug----

At first glance in Gecko we have a concept of "handling user input" which is based on certain events (e.g. key events, but not for some keys), which is propagated through some async things (form submission is the obvious one; not sure there's anything else really). Oh, and it times out after a pref-configurable time (defaulting to 1s), so if your event listeners take longer than that, you're no longer considered to be handling user input. This is used in various placed including at least media element code, certain controls on when navigations can happen, fullscreen-allowed checks, certain execCommand (cut/copy) bits, probably others.

There's also a separate concept of "popup blocker state". This is somewhat based on the "handling user input" state but with some additional restrictions and maybe some loosenings (e.g. some trusted keyboard events that may affect popup blocker state may not affect "handling user input state" at first glance). The popup blocker state propagates through more async stuff, including javascript: evaluation, form submission, link clicks (which for example do their targeting of a load async in Gecko), setTimeout/setInterval (but only one level deep, and only if the delay of the timeout/interval is less than a pref-configurable value defaulting to 1s).

Oh, and the popup blocker state is not a boolean. It can actually take on 4 values, which affect whether this particular popup can open, as well as whether future popups will be able to get opened, and whether certain other non-window-open operations that use the popup state can happen.

The popup state is used for window opening, obviously, but also allowing/disallowing focus() calls. Not sure where else.

RByers commented 7 years ago

One notion that's important to chromium which it sounds like Gecko has too is the restriction which permits only a single window.open call to succeed for each instance of user input. I.e. dragging a mouse can open a popup from a mousedown, mousemove or mouseup listener, but only the first such window.open call will succeed across all of these (so an abusive site can't open a whole slew of popups for a single mouse drag operation). Perhaps we could start by working that into the spec somehow?

For the other details, I'm willing to try simplifying blink's behavior here to better match the spec, especially if there's another implementation who seems to be getting away (from a web compat perspective) with a simpler / more rational model. It's expensive (but not impossible) to figure out the best trade-off by trial and error, but it should be relatively cheap to switch to something simpler that's already successful in some other engine. IIRC Edge folks said they had a much simpler model which seemed to be working OK for them.

bzbarsky commented 7 years ago

Gecko's behavior there is different from Chrome's. In Gecko, this:

document.body.onclick = function() { window.open(); window.open(); }

will open two windows on click. There are various other heuristics Gecko does around popup blocking, but I'm pretty sure we don't track it across events. That said, we don't allow popups from mousemove at all. And popups from click are treated differently in various ways from popups from mouseup/mousedown/dblclick.

Oh, and in Firefox a lot of this stuff (e.g. which events allow popups at all) is user-configurable.

RByers commented 7 years ago

will open two windows on click.

Oh, interesting. I don't think our security team would be OK with that - we've seen malicious sites which get a user to click somewhere, and then open 100s of pop-up windows when they do. We're even hoping to fix the case where a tap on a touchscreen currently can open two pop-ups (one for touchstart, one for click), although since that's bounded to two so it's considered a pretty low severity bug.

bzbarsky commented 7 years ago

and then open 100s of pop-up windows when they do

We do limit the number you can open on click. The limit is just larger than 1. And a bit dynamic depending on past behavior.

I wasn't suggesting you implement the Gecko behavior here; just saying that there is a wide range of what people consider ok popup blocker behavior....

RByers commented 7 years ago

Yep, understood - thanks for the details. I'm just trying to figure out how best to converge our implementations and leaning towards changing chromium to match others when possible :-)

RByers commented 7 years ago

Filed this chromium bug to track efforts to better align our implementation with the spec (one way or another).

mystor commented 7 years ago

I imagine that for propagating the "triggered by user activation" attribute across callbacks, we'd want to do something like the following:

  1. Propagate into promise callbacks triggered within an event "triggered by user activation"
  2. Consider (some?) IDB callbacks to be "triggered by user activation" if they were requested within an event "triggered by user activation"
  3. Consider setTimeout callbacks to be "triggered by user activation" if they were scheduled with, say, <20ms as their timeout value from another callback "triggered by user activation"?
  4. Consider XHR callbacks to be "triggered by user activation" if the request was made within the a callback "triggered by user activation" (perhaps with a short-ish timeout (1s?) so servers can't feed us bytes slowly to get an arbitrary timeout with XHR callbacks).
  5. Potentially other places which I didn't immediately think of.
domenic commented 7 years ago

Hey @mystor,

In Chrome we're investigating a maximally-simple solution, per this design doc. I guess it hasn't been mentioned on this thread yet.

The TLDR version: a user activation sets a bit, indefinitely, which will be consumed next time some user-activation-requiring action happens. We're prototyping that now as we hope it will be a great simplification to our current architecture, but we wanted to make sure it was web compatible before proposing it too concretely.

I'd love to get your thoughts on that kind of setup, assuming that we find it to be web compatible.

mystor commented 7 years ago

@domenic I worry about the indefiniteness of this flag, especially for certain APIs. For example, it would be problematic if a document received a user input event, waited a few minutes for the user to stop interacting with their computer, and then went fullscreen trying to mimic the browser/OS UI for phishing.

I imagine we could sort the APIs into security-sensitive and annoying ones, and put the security-sensitive ones behind a timer (e.g. the fullscreen API). I think this is fairly similar to what we have in gecko right now, where we don't have a timer for any of our APIs except for the fullscreen one (which must be triggered within 1s of the user input event. The merely annoying ones potentially could avoid/have a much longer timeout (e.g. clipboard APIs, popup?, autoplay).

cc @smaug---- and @ehsan who have had opinions about this flag in gecko in the past.

bzbarsky commented 7 years ago

I have the same worry as @mystor. This is why Gecko right now unsets the "in user activation" flag after 1s even if we're still handling the user input event (e.g. one of the listeners did a 1s busy-loop). The idea is that from the user's point of view there should be a clear connection between the action they took and whatever it is the page is trying to do.

mustaqahmed commented 7 years ago

In our proposed design, we wanted to have all APIs consume user activation, to minimize the effect of indefinite lifespan. I agree that it's debatable but since a single user activation can be used at most once, doesn't it minimize the "indefiniteness" worry?

By the way, both Chrome & Edge currently consume the activation with popups, but neither Firefox nor Safari do. And this is just one of many differences for popups we have today.

sach2211 commented 6 years ago

Wanted to add one more API for consideration => credential management api -- Where the promise triggers an account chooser UI, but resolves ( or rejects) only after user selects (or rejects) the dialog box. This can easily take up more than a few seconds.

mustaqahmed commented 6 years ago

@sach2211: yes, this looks like a valid use case that requires no-auto-expiry. I think if we confine the "reach" of a user activation through consumption, a time limit could be avoidable.

Just discovered that Safari 11 (I tried 11.0.1) has started consuming user activation for popups, matching Chrome and Edge.

mustaqahmed commented 6 years ago

A quick note, somewhat related: I have created a table of "activation defining events" for Chrome, Edge, Firefox and Safari, each for both mobile and desktop, which shows more inconsistencies (at least more than what I was hoping to find):

I will create a separate github issue to discuss the event-set, to keep the modeling discussion (this issue) separate. We have separate Chrome issues for them already: fixing event-set vs User Activation v2 model.

smaug---- commented 6 years ago

So the test is about window.open (meaning it is about popup blocking in Gecko) ?

bzbarsky commented 6 years ago

@mustaqahmed Firefox has at least three different concepts of activation that have different rules applying to them. window.open uses one of those concepts, but other things use the other ones...

mustaqahmed commented 6 years ago

@smaug----: Sorry missed your question in between my vacation. Yes, I tested only popups.

@bzbarsky: In FF, I had noticed a difference between window.open() and navigator.vibrate() but I ruled it out as a bug. Did you mean FF has three different event sets for different "classes" of APIs? If so, is it by design (vs because of isolated/incremental implementation)?

bzbarsky commented 6 years ago

Did you mean FF has three different event sets for different "classes" of APIs?

Not just event sets but also propagation rules (e.g. across the event loop, across time within a single event, etc).

If so, is it by design (vs because of isolated/incremental implementation)?

Well, the reason we have multiple different definitions is that the heuristics used by some use cases were judged wrong for other use cases... I'm not sure how to classify that in your classification.

But I would like to point out that the threat model of "a popup opens when the user did not expect it to" is quite different from "a site gets access to the camera when the user did not expect it to" or whatnot. So some things, especially new APIs, may want a more stringent definition of "activation" while others (popup blocking was the poster child here) may need a looser one for backwards compat.

mustaqahmed commented 6 years ago

But I would like to point out that the threat model of "a popup opens when the user did not expect it to" is quite different from "a site gets access to the camera when the user did not expect it to" or whatnot. So some things, especially new APIs, may want a more stringent definition of "activation" while others (popup blocking was the poster child here) may need a looser one for backwards compat.

I totally agree that we need to handle user-API-specific requirements, we have those in Chromium too. What I wanted to emphasize is that the details were developed over a long period of time w/o a guiding spec, so we can expect to have some rough edges (e.g. a vague or unintended behavior) that would need some polishing. E.g. here is one such fix we did in Chromium, and I won't be surprised if FF has seen similar issues.

In any case, we would like to have a well-defined set of API-specific variations for the Web, with not-too-many possibilities, right? Otherwise, developers and users would be confused.

This discussion sets the ground for our first major update about User Activation v2 in Chrome, stay tuned (while I update a few things based on @bzbarsky's comment above).

mustaqahmed commented 6 years ago

User Activation v2 is now available in Chrome for testing. Please give it a try and let us know your feedback. You can enable the feature in M67+ through chrome://flags/#user-activation-v2 or the command-line flag --enable-features=UserActivationV2.

Here is a short explainer with demos: mustaqahmed.github.io/user-activation-v2.

Plans for this quarter:

smaug---- commented 6 years ago

The more I think this, the more I start to like the approach where HasConsumableUserActivation is propagated through promises and setTimeout etc. and not stored on document or window. The global HasConsumableUserActivation really doesn't work with cases when there is for example the button to accept cookies and then page also has a video. Sure, even when propagating through callbacks, one could always just start the playback when cookie button is activated, but at least that would require explicit somewhat user hostile behavior from the web page.

User activation v2 is simple to implement, but I'm not sure it gives the best user experience.

RByers commented 6 years ago

I agree explicit propagation through all async APIs is able to provide a better user experience in general.

But despite chromium having such a system for years (ableit with some bugs/limitations which were hard to address), there has been little investment from other engines in implementing something similarly powerful. So we've seen a lot of frustration from developers in not being able to rely on or generally reason about the behavior (both due to edge cases in Chrome, and do to huge differences cross-browser), and as such the promised user experience benefits haven't really materialized across the web (things are just as often broken for users).

Therefore we decided in chromium to focus on testing the hypothesis that the simpler model could provide a typical user experience that's generally as good in practice, while being possible to achieve interoperability around.

I personally don't have any interest in continuing to invest in the more complex causality-tracking solution unless either at least two other engines have successfully shipped such a model, or the simpler model has proven to have negative UX consequences which are unacceptable to Chrome.

smaug---- commented 6 years ago

Well, I think the issue here is that there hasn't been any kind of proper spec around this. And also, the web has changed a lot over the years. Gecko's popup blocking code for example is mostly from FF1.0 era (written by jst ;) ). The web isn't the same it was in 2004.

Chrome had a partial implementation to propagate the flag, but how could others have followed that without a spec?

mustaqahmed commented 5 years ago

User Activation v2 shipped in Chrome 72, and it successfully went through the whole Stable release without any major issues, yayy! :birthday:

We encountered a few minor issues, added only one small tweak to the original model which we expect to remove (partially/completely) in future. Here is a summary: