User gestures - Githubissues

AshleyScirra commented 1 year ago

Description

User gestures (aka user activation) restrict certain sensitive actions, such as opening a popup window, to only be allowed in response to a user gesture, i.e. some kind of user input event. Browsers currently significantly differ in their implementation of this though, and it causes awkward compatibility problems.

Rationale

The model implemented by Chrome works well: there are essentially two flags and a short timeout. Other browsers do not support this though, only considering synchronous code run inside a user input event as a user gesture.

This causes problems such as:

difficulty running async code for input events (an await can mean you lose the gesture)
difficulty delegating work to Web Workers (postMessage can mean you lose the gesture)
blocking use of new APIs like OffscreenCanvas (in a web worker, you never have a user gesture)
causing other browsers to implement case-by-case exceptions e.g. allowing user gestures to propagate with certain APIs, but not others; or modifying the design of APIs (e.g. Clipboard API) to solve a similar problem

Chrome's model solves all these problems, assuming there is not too long a wait. However in many cases that is perfectly sufficient. For example converting a Blob to an ArrayBuffer is async but will likely complete very quickly for small blobs; in Chrome a user gesture can still be used afterwards, but in other browsers it cannot.

Specification

"Tracking user activation" in the HTML spec: https://html.spec.whatwg.org/multipage/interaction.html#tracking-user-activation

Tests

https://wpt.fyi/results/html/user-activation

foolip commented 1 year ago

Thanks for proposing this, @AshleyScirra!

blocking use of new APIs like OffscreenCanvas (in a web worker, you never have a user gesture)

Can you say more about this? Are there any OffscreenCanvas APIs that require a user gesture?

cc @mustaqahmed who's done a lot of work on user gestures in Chrome.

AshleyScirra commented 1 year ago

We have a game engine that supports running entirely in a web worker with OffscreenCanvas. In our architecture input events are forwarded to the worker via postMessage(), it runs the entire engine and all logic in the worker, and then it posts back for calls to any APIs not available in a worker (window.open, WebRTC, etc...)

However this means calls are never synchronously in a user input event, and so in some browsers we lose the ability to use any user-gesture limited APIs at all. So we have to turn off using a web worker and stop using OffscreenCanvas, just so we can keep within the user gesture rules. I know OffscreenCanvas isn't supported outside of Chromium yet, but even when it is, this is a blocking issue for us to switch to using it.

foolip commented 1 year ago

I see, so an example might be when there's a "fullscreen" button in the game's UI, and you need to actually run the game engine to know that it was clicked, and then do the element.requestFullscreen() call on the main thread. Is that the kind of thing that's made complicated by the model in some browsers?

mustaqahmed commented 1 year ago

To the best of my knowledge, most other browsers use the old HTML model that required calling activation-gated APIs synchronously in appropriate event handlers, or asynchronously after a (undefined) propagation. That undefined part was known to have fundamental problems, and every browser (including pre-M72 Chrome) added unpredictable workarounds to make it work. The primary motivation for the current HTML model was to support async usage of user activation in a well-defined manner that is agnostic to any propagation scheme in JS.

AshleyScirra commented 1 year ago

I see, so an example might be when there's a "fullscreen" button in the game's UI, and you need to actually run the game engine to know that it was clicked, and then do the element.requestFullscreen() call on the main thread. Is that the kind of thing that's made complicated by the model in some browsers?

Yes, exactly. It might seem a roundabout approach to making the call, but if you make middleware/game engines/libraries that run in a worker, you can end up with that situation; by the time you try to call requestFullscreen() you've lost the user gesture. I suppose a similar problem may arise with trying to host WebAssembly builds of codebases in a worker. The only good workaround is to stop using a worker and run everything on the main thread again.

I guess this is a variant of "difficulty delegating work to Web Workers", but I called it out separately as giving up on running code off main thread is a pretty significant consequence.

tbondwilkinson commented 1 year ago

User gestures is a P2 for Google's closure library

"Allows more work to be done asynchronously, which unblocks migration to Promise."

marcoscaceres commented 1 year ago

WebKit implements the "new" model in HTML, and has for a long time.

About the problem @AshleyScirra outlines, understood. I guess it would be good to see some examples. WebKit, for instance, allows quite a bit of time before consuming the activation (this obviously shouldn't be relied on and can change without notice!).

So, I guess I'm wondering why one would do:

User clicks.
Send message to worker.
await response
requestFullscreen()

Instead of:

User clicks.
requestFullscreen() + do anything else that requires transient activation.
Send message to worker.
await response
exit full screen if something went wrong.

marcoscaceres commented 1 year ago

Just so we have some more context, the APIs in WebKit that either consume user activation or check for transient activation are:

Wake Lock
WebXR
Fullscreen
HTMLInputElement
Notifications
Push API (consumes)
Web Share (consumes)
Payment Request (consumes)

So the discussion needs to be framed around when the above are a problem (the list is not exhaustive, other engines may support more).

AshleyScirra commented 1 year ago

@marcoscaceres - it doesn't look to me like Safari does support this. This WebKit bug is still open and some of the cases, such as reading a blob, still prevent a user gesture from working based on a quick test now in Safari 16.1.

As for the requestFullscreen workaround, middleware like Construct's game engine, and probably also anything compiled to WebAssembly and running in a worker, doesn't know what the intent is with any given input event. It uses an architecture where it simply forwards all input events to the worker, runs logic, and then posts messages back to the DOM to run any APIs not available in a worker. We can't request fullscreen and then cancel it on every single input event just in case it does want to enter fullscreen - it would be unusable with constant flickering for the user.

marcoscaceres commented 1 year ago

@marcoscaceres - it doesn't look to me like Safari does support this.

I still don't know what "this" is exactly (I assume it's "the model implemented by Chrome" or is it https://mustaqahmed.github.io/user-activation-v2/ ?

WebKit support what's in the HTML spec today, which sounds like it differs from the model implemented by Chrome?

Is there some other proposal/PR to HTML that you are referring to that specifies that proposal?

(Hi @mustaqahmed, do you know?)

We can't request fullscreen and then cancel it on every single input event just in case it does want to enter fullscreen

Understood. It sounds like we need some kind of thing that requests more time or somehow links a set of operations started by a user gesture.

AshleyScirra commented 1 year ago

I still don't know what "this" is exactly (I assume it's "the model implemented by Chrome" or is it https://mustaqahmed.github.io/user-activation-v2/ ?

Yes - that. As far as I understand it, Chrome's model follows what is now in the spec.

WebKit support what's in the HTML spec today

I'm really confused by this remark because it appears to me WebKit has never supported the latest spec on this and it still looks like it doesn't in Safari 16.1. I've never seen it referenced in Safari's release notes, which I have closely followed for years. The WebKit issue is still open, and I would have assumed it would be closed if it was supported.

Try the second button ("Open popup after async blob read") in this test case: https://downloads.scirra.com/labs/safariusergesture/index.html

It successfully opens a popup in Chrome, because its user gesture rules allow a ~1 second timeout. It does not work in Safari, which I always thought was because it never updated to the latest spec around user gesture rules: the fact there is an await means you lose the user gesture. If it had implemented the latest user gesture rules, then this would work, because it complies with the short timeout period. So it seems either Safari does not support the latest spec, or it does but it's broken in cases like this; either way my proposal is still necessary in order to clear up cross-browser inconsistencies like this.

gsnedders commented 1 year ago

I still don't know what "this" is exactly (I assume it's "the model implemented by Chrome" or is it https://mustaqahmed.github.io/user-activation-v2/ ?

Yes - that. As far as I understand it, Chrome's model follows what is now in the spec.

I don't actually know what exactly Chrome implements today, but the large spec change happened in https://github.com/whatwg/html/commit/8f8c1f50158736b3cf16188377a0974a20367c8b. This largely split the prior "user activation" into transient and sticky activation, and defined what is events are considered user activation.

WebKit support what's in the HTML spec today

I'm really confused by this remark because it appears to me WebKit has never supported the latest spec on this and it still looks like it doesn't in Safari 16.1. I've never seen it referenced in Safari's release notes, which I have closely followed for years. The WebKit issue is still open, and I would have assumed it would be closed if it was supported.

https://github.com/WebKit/WebKit/commit/941560c4313d7ece0c04a1cdee99f4a67e8c2fbb added transient activation; this shipped in Safari 14.

Try the second button ("Open popup after async blob read") in this test case: https://downloads.scirra.com/labs/safariusergesture/index.html

It successfully opens a popup in Chrome, because its user gesture rules allow a ~1 second timeout. It does not work in Safari, which I always thought was because it never updated to the latest spec around user gesture rules: the fact there is an await means you lose the user gesture. If it had implemented the latest user gesture rules, then this would work, because it complies with the short timeout period. So it seems either Safari does not support the latest spec, or it does but it's broken in cases like this; either way my proposal is still necessary in order to clear up cross-browser inconsistencies like this.

I think the problem is that we didn't uniformly move all existing usage of user gestures over to transient activation—including window.open. This probably requires a pretty different set of tests.

AshleyScirra commented 1 year ago

I'm still confused - https://github.com/WebKit/WebKit/commit/941560c4313d7ece0c04a1cdee99f4a67e8c2fbb appears to refer to using Web Share after AJAX - one of Safari's many API-specific carve-outs - rather than a general implementation of the modern user activation model.

Web content can feature detect the modern user activation model via navigator.userActivation (which appears to the in the HTML spec here). This exists in Chrome but is undefined in Safari 16.1. So it still looks to me like Safari does not support it...

EdgarChen commented 1 year ago

Gecko have been implemented the behavior defined spec and we have not yet implemented navigator.userActivation (bug 1791079), I think that is the main reason of wpt failures on Gecko.

But our window.open haven't migrated to the spec model, and I agree https://github.com/web-platform-tests/interop/issues/142#issuecomment-1293397511 that probably requires a pretty different set of tests, maybe we could split window.open into a different proposal?

mustaqahmed commented 1 year ago

What @gsnedders and @EdgarChen mentioned above matches Chrome's experience: lots of APIs, both exposed and Chrome internal, rely on user activation, and Chrome needed quite a bit of time to migrate all to UAv2!

Good point about navigator.userActivation interface: exposing this allows testing the core implementation easily. Note, however, the individual "user APIs", like Fullscreen and Popups, need their own WPTs.

mustaqahmed commented 1 year ago

Is there some other proposal/PR to HTML that you are referring to that specifies that proposal?

(Hi @mustaqahmed, do you know?)

I think part of the confusion here is between:

A. The core user activation implementation as per the HTML spec (which now includes navigator.userActivation), and

B. How dependent APIs use it---every API here needs to fix its spec and add WPTs to clarify its reliance on core user activation model (say, transient vs sticky).

For progress tracking on B, we have meta issue https://github.com/whatwg/html/issues/5129 for external spec changes only, and now I think we need a similar one in this repository for WPTs. Thoughts?

marcoscaceres commented 1 year ago

@EdgarChen, about the tests (mostly related to UserActivation), please see the discussion in: https://github.com/web-platform-tests/wpt/issues/36727

We've identified a bunch on non-conforming things in the tests during implementation of the API in WebKit. Would love your (or other Gecko folks') input there!

Cases in point:

window.open() and .close() are not specified to consume user activation.
The fullscreen API are not specified to consume the user activation.

marcoscaceres commented 1 year ago

@AshleyScirra wrote:

I'm still confused - https://github.com/WebKit/WebKit/commit/941560c4313d7ece0c04a1cdee99f4a67e8c2fbb appears to refer to using Web Share after AJAX - one of Safari's many API-specific carve-outs - rather than a general implementation of the modern user activation model.

No, Web Share is using the V2 model in WebKit. The problem is that the V2 model itself can't cater for the case outlined in the bug: the v2 model appears to not handle that case (not WebKit's fault).

We need a Model v2.1 that can extend the timeout before transient activation expires or something. We should discuss that elsewhere, however.

Web content can feature detect the modern user activation model via navigator.userActivation (which appears to the in the HTML spec here). This exists in Chrome but is undefined in Safari 16.1. So it still looks to me like Safari does not support it...

Again, no. This a completely incorrect assumption. You are confusing transient activation, the UserActivation API, and APIs still using the "old model". These are completely different (but related) things.

As @gsnedders mentioned, some APIs are using transient activation in WebKit (my list in https://github.com/web-platform-tests/interop/issues/142#issuecomment-1291754576) - others are using "the old model": this is likely the case for all browsers.

I filed this bug on WebKit to track where the "old model" is used in WebKit that MAY need to be migrated over: https://bugs.webkit.org/show_bug.cgi?id=247159

Hopefully that clarifies things.

AshleyScirra commented 1 year ago

Forgive me but I'm still trying to understand the situation here.

What I need as a web developer is:

All browsers to have a general implementation of the V2 activation model for all input types and all APIs. The test case I provided, where a short await prevents the ability to use window.open() in Safari, is exactly the type of cross-browser inconsistency I'd like to see cleared up, and is the reason for this proposal. It looks to me like the V2 activation model should cover this case, based on the HasConsumableUserActivation bit being set for a timeout after the input event, but it doesn't work in this specific case and probably others.
A way to feature-detect the V2 activation model. Otherwise how do we know we can do something async before using a user-gesture limited API? We have some clunky workarounds and sometimes we need to know ahead of time if they ought to be used. The presence of navigator.userActivation lets us feature detect this. (I did not originally mention this in the proposal - perhaps I should have done, but I think I originally assumed that this was the same as point 1, which perhaps was an incorrect assumption.) Otherwise we have to fall back to parsing user agent strings and all the pitfalls of that.

If the V2 model is insufficient for some specific cases, I'm happy for browsers to extend it at their discretion - after all, the wording of the V2 model refers to "an expiry time defined by the browser" which could be interpreted in a suitably flexible way. I don't think I really understand what Safari's specific exemptions around AJAX and Web Share are beyond the V2 model. But what I'm really after is the V2 model to be a minimum baseline that can always be relied upon for all APIs. Any exceptions beyond that are welcome! It only makes it easier to use user-gesture limited APIs.

So if I understand correctly, my proposal would involve using V2 activation consistently for all APIs as a minimum guarantee, completely removing the "old model", and implementing navigator.userActivation as a feature detection signal that the V2 model can be relied upon.

I apologise for any frustration in trying to communicate this but user activation inconsistencies have been a pain point for us for years now and I'm keen to make sure it is clear what web developers need and what solving this would mean in the eyes of a web developer.

marcoscaceres commented 1 year ago

Hi @AshleyScirra,

All browsers to have a general implementation of the V2 activation model for all input types and all APIs.

Agree. That's the shared goal.

However, it keeps sounding like you are conflating whatever Chrome implements with the HTML spec's activation model. Those things might not align so please keep that in mind.

I gave some examples above where today Chrome and other browsers currently disagree:

window.open(), Fullscreen API, and probably other things consuming user activation.

And there are places where Chrome may be ahead (see the WebKit bug I filed), and @mustaqahmed mentioned that Google has done a bunch of work to migrate things over.

The test case I provided, where a short await prevents the ability to use window.open() in Safari, is exactly the type of cross-browser inconsistency I'd like to see cleared up, and is the reason for this proposal.

That would be greatly appreciated.

However, please note that there is nothing in the HTML spec that connects window.open() to transient activation.

You've may have identified literally "unspecified behavior".

It looks to me like the V2 activation model should cover this case,

It doesn't, AFAIK: if you click on "consume user activation" in HTML or you go to the definition of window.open() there is nothing in HTML that talks about 'transient activation".

based on the HasConsumableUserActivation bit

I don't know what HasConsumableUserActivation is? Where did you find this? Or is this something you're proposing?

being set for a timeout after the input event, but it doesn't work in this specific case and probably others.

Assuming the above is just "consume the user activation" (which does the magic of starting the timer).

A way to feature-detect the V2 activation model. Otherwise how do we know we can do something async before using a user-gesture limited API?

Respectfully, no. This needs to be in each spec. It's not a detectable thing.

How you know, is because the specs that use "consume user activation" or "has transient activation" check.

We have some clunky workarounds and sometimes we need to know ahead of time if they ought to be used. The presence of navigator.userActivation lets us feature detect this.

Again - please please please don't do that! that's an extremely bad assumption: the timing across users agents is going to differ and your transient activation could be consumed by basically anything (including the browser for whatever reason). You can't make any assumptions there. For instance, your page could be BFCached and the script wouldn't have a clue that the world has changed and you've lost transient activation.

(I did not originally mention this in the proposal - perhaps I should have done, but I think I originally assumed that this was the same as point 1, which perhaps was an incorrect assumption.) Otherwise we have to fall back to parsing user agent strings and all the pitfalls of that.

No. You've raised valid issues around this... and it has resulted in good bugs! (again, the bug I filed for WebKit).

If the V2 model is insufficient for some specific cases, I'm happy for browsers to extend it at their discretion - after all, the wording of the V2 model refers to "an expiry time defined by the browser" which could be interpreted in a suitably flexible way. I don't think I really understand what Safari's specific exemptions around AJAX and Web Share are beyond the V2 model.

Again, there are no "Safari's specific exemptions". We are not doing anything HTML doesn't say to do.

If other browsers are doing something different, they are in violation of HTML. That's not a bad thing if it's "doing the right thing"™️ by developers. We just need to get that fixed in HTML.

The problem is that the V2 model appears to be either too limiting or broken.

But what I'm really after is the V2 model to be a minimum baseline that can always be relied upon for all APIs.

Again, that's totally the goal. But please please please, don't come in with a mindset of "Safari is doing the wrong thing".

It's entirely possible for another browser to exhibit what you consider to be the right behavior, while not adhering the the standards. I'm not suggesting anyone should regress their behavior, just that the behavior might not be specified.

Any exceptions beyond that are welcome! It only makes it easier to use user-gesture limited APIs.

So if I understand correctly, my proposal would involve using V2 activation consistently for all APIs as a minimum guarantee, completely removing the "old model", and implementing navigator.userActivation as a feature detection signal that the V2 model can be relied upon.

Again, no 🥲. If the UserActivation API will be used in the way you suggest, then that's really bad for the Web. I would be extremely reluctant to enable it in WebKit. What you are suggesting is extremely bad practice. I can't say that enough. Please don't do the above, ever.

I apologise for any frustration in trying to communicate this

You don't need to apologize. I 100% understand where you are coming from. I know this is not your problem and you "just want stuff to work"™️. And I understand you are trying to hack around the problems because you have a real product that relies on all this... and it's all quite an incompatible mess.

But I absolutely promise you we are working on fixing the things you said above.

Just please, don't rely on the UserActivation API to mean what you said above. That will poison the well.

but user activation inconsistencies have been a pain point for us for years now and I'm keen to make sure it is clear what web developers need and what solving this would mean in the eyes of a web developer.

Understood. That's quite evident now. I think the path forward is pretty clear:

We need to re-evaluate if the V2 is fit for purpose.
WebKit needs to transition a few APIs over away from the old model (bug filed).
Specs may need to be updated to check, and if needed consume, transient activation.
We need better tests - and ones that don't rely on non-standard behavior.
We may need a way to consume transient activation via Web Driver or directly through the API.

Folk here, agree with the above?

domenic commented 1 year ago

However, please note that there is nothing in the HTML spec that connects window.open() to transient activation.

This is not correct.

It doesn't, AFAIK: if you click on "consume user activation" in HTML or you go to the definition of window.open() there is nothing in HTML that talks about 'transient activation".

There is. Step 8 of rules for choosing a navigable, which is the part of window.open() that actually creates the window, requires transient activation.

What you are suggesting is extremely bad practice.

I don't really agree with that. I think it's reasonable for someone to expect that a browser implements navigator.userActivation, at the same time as they transition all their existing APIs to follow the various specs which use the user activation model in the HTML spec (instead of the legacy models they used before). Then it would be a reasonable feature detection mechanism, to determine whether e.g. a browser has connected window.open() to their old nonstandard model, or to the new standard model.

In other words, I think it's reasonable for web developers not to expect browsers to ship with two separate user activation models, with different APIs using each.

marcoscaceres commented 1 year ago

which is the part of window.open() that actually creates the window, requires transient activation.

Right, sorry - and that makes sense. But it doesn't consume it (different discussion, I know).

I don't really agree with that. I think it's reasonable for someone to expect that a browser implements navigator.userActivation, at the same time as they transition all their existing APIs to follow the various specs which use the user activation model in the HTML spec (instead of the legacy models they used before).

I was thinking about this also. It would only be prudent to expose the UserActivation API once the all the APIs had been migrated over. However, migrating all the APIs could take a Very. Long. Time.

Developers (and browser vendors) could benefit from API being exposed without the above requirement is my point. There are already tests in WPT appearing that are relying on UserActivation being available.

That's why I'm saying that the UserActivation is right now a bad proxy for "user activation is implemented everywhere uniformly" - and may be so for a while.

In other words, I think it's reasonable for web developers not to expect browsers to ship with two separate user activation models, with different APIs using each.

We all agree with this - this is the ideal world we want to get to... but that's not the world we are in, hence this bug.

AshleyScirra commented 1 year ago

So I guess what I'm asking for here is also "please update the spec to talk about the new user activation model where appropriate". In fact I would request that the spec is changed to align with what Chrome ships today, as that is the most useful form of user activation (and what I thought corresponded to the spec). I had assumed the spec got updated accordingly for all user-activation-restricted APIs when the v2 activation model went in to the spec; if not then that would also need to be done to make the spec consistent and then result in consistent behavior across browsers. I am indeed coming from a "just want it to work" perspective though!

I don't know what HasConsumableUserActivation is? Where did you find this?

I was looking at https://mustaqahmed.github.io/user-activation-v2/, but it does say it's out of date, I probably shouldn't refer to it any more. I think it is the same thing as "transient activation" in the spec wording.

Again - please please please don't do that!

I think this is the kind of thing where there is a difference between the perspective of spec authors/browser developers and a perhaps more pragmatic just-get-it-working view of a web developer.

We have an entire game engine that can run in a web worker with OffscreenCanvas. However this only works if v2-style transient activation is supported (as it is in Chrome), because all inputs are sent via postMessage() to the worker, and all API calls not available in a worker are made by posting back to the main thread. This typically happens within one or two frames (16-32ms) as the game ticks its logic. This is well within any reasonable "transient activation" timeout and so everything still works: window.open, Fullscreen API, requesting user media, geolocation, clipboard, speech synthesis, etc. etc.

Now suppose Safari or Firefox implement OffscreenCanvas, but not user activation equivalent to Chrome. If we feature-detect just OffscreenCanvas and enable it based on that, it means postMessage() can lose the user activation and so a subsequent post back to the main thread will fail to access some APIs. That's a big breaking change for us and would mean suddenly lots of content is broken. I want to avoid that. So I need a way to feature-detect user activation rules that support this architecture. The best thing I've been able to find is navigator.userActivation. Sorry - we're already using that! If there is a better way to feature detect this, other than UA-parsing (which is a big mess of its own), then I would be happy to use that. But I don't believe there is any. And if we don't attempt to feature detect it then we risk all our content being broken when other browsers ship OffscreenCanvas. I can't accept that risk so I have to do something to mitigate it.

Another example is our web app does a very short await and then calls window.open in order to preview some content. With Chrome's activation model, that's fine. But in other browsers the popup is always blocked with default settings; we can ask them to click another button, but then we prompt the user over and over again through a session, and the usability sucks. It turns out Safari lets you keep user activation through nested timers for up to 1 second (I don't know if that's per the spec or not, but it works). So we poll a nested timer and call window.open() when the async work is done. It works, and the usability is better for our customers. Yeah, it's an ugly hack. But unless there's a better option, we'll hack it, because the other option is a worse web app for our customers.

As a rule of thumb, web developers generally need feature-detection for any observable difference between browsers. If it doesn't exist, then we still have a problem that needs solving (and often customers actively complaining about it who want it solved), so we hack something if we can. If navigator.userActivation is not meant to be used for feature-detecting this, perhaps there should be something else? But what would that be? I'm not sure, and so long as browsers don't align and don't allow web content to detect differences, web developers are forced to make workarounds, and we can't eliminate the web compat risk. For our part, we actively monitor this stuff and will issue software updates promptly if anything needs changing.

marcoscaceres commented 1 year ago

it means postMessage() can lose the user activation

I'm confused as to why postMessage() would consume user activation? Transient activation is associated with the window, not with anything else (aside from the APIs that explicitly consume it). Also, which APIs in a worker are depending on having transient activation? that seems wrong...

So I need a way to feature-detect user activation rules that support this architecture.

You can't assume that... there may be APIs that continue to use the old model forever for web compat reasons. Thus, you just can't make that arbitrary assumption. The UserActivation API only exposes two things: "isActive" and "hasBeenActive", that's it! It has nothing to do with "does this browser implement all the user activation things everywhere?".

Further, there is nothing in HTML that says "user agents MUST NOT expose the User Activation API unless they implement V2 Model everywhere". That would be unreasonable and impractical.

Sorry - we're already using that!

You are doing it at your own and your users' detriment 😢

If there is a better way to feature detect this

It needs to be handled on a API-by-API basis (see below!).

Safari lets you keep user activation through nested timers for up to 1 second (I don't know if that's per the spec or not, but it works).

This sounds like "old model"... This is good. I can look into this, it's actionable, and something we can probably fix! 🥰

As a rule of thumb, web developers

I've been developing web pages since I was 16... I'm now 43. Believe me, I know this game well.

is not meant to be used for feature-detecting this, perhaps there should be something else?

Just come here and tell us what's broken. Half joking, the way to "feature detect" Is: "is the bug for X open in bugs.webkit.org"? We are making a concerted effort to prioritize and fix stuff as part of this interop effort.

For our part, we actively monitor this stuff and will issue software updates promptly if anything needs changing.

That's great. Please keep doing that. We will hold up our end of the barging by fixing bugs also.

Just to finish off, this is really helpful @AshleyScirra. I really appreciate that you've provided the detailed responses and the amount of thought you've given this stuff. I know we've got around in circles a quite a bit, but it's quite fruitful. If you can keep telling us specific APIs that are affecting your work/app, then we can try to prioritize those.

domenic commented 1 year ago

You can't assume that... there may be APIs that continue to use the old model forever for web compat reasons. Thus, you just can't make that arbitrary assumption. The UserActivation API only exposes two things: "isActive" and "hasBeenActive", that's it! It has nothing to do with "does this browser implement all the user activation things everywhere?".

To be clear, the ask here is not for the existence of navigator.userActivation to imply "the browser implements all the user activation things everywhere". It's for it to imply, "for those user activation things the browser does implement, it does so according to the modern user activation spec, instead of doing so according to old nonstandardized behaviors".

Further, there is nothing in HTML that says "user agents MUST NOT expose the User Activation API unless they implement V2 Model everywhere". That would be unreasonable and impractical.

I don't think that would be unreasonable and impractical. We could add that to HTML if it would help web developers. Indeed, the only reason we haven't so far, is that we assumed all implementations would follow the path of only exposing web developers to a single model. That's what we did in Chrome: we developed the new model in parallel to the old one, behind a flag, and once we flipped the flag, all call sites were updated to the new model, in a single release.

That seems like the most beneficial strategy for web developers, which is why it makes sense they might assume that browsers would follow it.

This approach makes way more sense than adding a feature detection API for every API that relies on the user activation spec. E.g. window.windowOpenUserActivationMode, window.paymentRequestUserActivationMode, window.fullscreenUserActivationMode, all of which return either "non-standard" or "standard". It would be very silly to ask browsers to implement such feature detection APIs, in my opinion; the effort of doing so would conceivably be more than just updating the APIs to use the new C++ function call behind the scenes, and it would be a strange API to expose in the long term.

Early you stated concern about such switches taking

a Very. Long. Time.

Maybe @mustaqahmed can comment on how much time it took for Chromium, but my impression was that it wasn't so hard to find all call sites of the old C++ functions, and update them to the new ones. Most of the time was in designing the model (which is done), and checking on the web compatibility of any changes (which is done for at least Chromium, and in general wasn't so hard because the new model is generally more permissive than the old one).

AshleyScirra commented 1 year ago

I'm confused as to why postMessage() would consume user activation?

It doesn't consume user activation. Under the old model, anything async in a user input event means you lose user activation (in this case, posting to a worker and waiting for a message to come back is fundamentally async). Under the v2 model, a short async bit of work can be done and still have transient activation. So the v2 model means you can do a short bit of async work and then successfully do something that consumes activation.

You can't assume that...

I know it kind of sucks, but I don't see a better option unfortunately. I will 100% use a better option if one is provided, but I don't believe there is one yet (unless this does become an official feature detection signal). We already deal with a bunch of browser bugs and inconsistencies in various hacky ways - usually filing browser bugs along the way, but sometimes they don't get fixed long-term - so to me this doesn't seem that much different to that kind of thing anyway.

there may be APIs that continue to use the old model forever for web compat reasons.

As I understand it the v2 model is backwards compatible. The old model seems to be "user activation is only in a synchronous user input event". The v2 model is roughly "user activation is in a synchronous user input event and a short time period afterwards". So perhaps there is little backwards compatibility risk to changing this? From @domenic's comment it sounds like the Chrome team didn't struggle too much with backwards compatibility.

I don't think that would be unreasonable and impractical. We could add that to HTML if it would help web developers.

As I mentioned it is indeed useful to have a way to feature detect the new model. If the presence of navigator.userActivation did indeed become that feature detection signal, that would solve the problem.

mustaqahmed commented 1 year ago

a Very. Long. Time.

Maybe @mustaqahmed can comment on how much time it took for Chromium, but my impression was that it wasn't so hard to find all call sites of the old C++ functions, and update them to the new ones. Most of the time was in designing the model (which is done), and checking on the web compatibility of any changes (which is done for at least Chromium, and in general wasn't so hard because the new model is generally more permissive than the old one).

I agree: converging to the correct model plus gradually addressing compat problems needed significant bandwidth from us (Chrome), and any new implementation work would greatly benefit from this.

However, another significant chunk of our effort went into fixing (many!) internal test failures caused by historical/incremental assumptions about how the then-semi-defined activation model should work. It's likely any old code-base would have to face such a cleanup job, so let's defer some of these topics to a latter (2024) goal.

The problem is that the V2 model appears to be either too limiting or broken.

Please check/file HTML issues so that we can track any longer term discussion separately without blocking this interop discussion.

I thought it would be great if we can curve out a 2023 goal before 2023 starts, and avoid "too many ~cooks~ APIs spoiling the broth" 😉! We can perhaps start with only the APIs that are already on (or very close to) standards track.

So, my proposal for 2023 is to target user activation interop as exposed by the following "Bucket 1" APIs. For convenience, I have created the 3 buckets of "user APIs", let me know if I missed or misplaced some APIs.

Bucket 1: specs that properly reference the HTML activation concept and are on a standards track

Payment Request API (consuming)
Fullscreen API: mentions activation requirement, a PR to consume got consensus!
Clipboard API (transient non-consuming)
Vibration API (sticky activation)
Web Audio API (sticky activation)
Web Share API
Web XR Device API

Bucket 2: specs that properly reference the HTML activation concept and are not on a standards track

Bucket 3: specs that need to be edited to properly reference the HTML activation concept

Opening form input elements: DateTimeChooser, ColorInput, FileInputType etc.
Popups (window.open): mentions activation requirement but w/o consumption. Complicated because of overlap with non-standard popup-blocker logic.

mustaqahmed commented 1 year ago

(Note: I have edited the comment above a few times to correct API buckets.)

EdgarChen commented 1 year ago

About the user gesture bucket 1 APIs, we are unlikely to focus on following APIs in 2023 due to resource and priority reasons.

Payment Request API (not yet exposed)
Vibration API (only implemented on mobile)
Web Share API (only exposed on mobile)
Web XR Device API (not yet exposed)

So we would like to propose excluding them from the list. We support Fullscreen API and clipboard API and WebAudio (cc @alastor0325 for WebAudio). https://wpt.fyi/results/html/user-activation are tests for the core implementation, where we propose to exclude most of them as if user would use navigator.userActivation to do feature detection, we probably need to move our window.open to spec model first before we expose the interface. And per https://github.com/web-platform-tests/interop/issues/142#issuecomment-1293638649, for each user APIs, we might need to add their own tests.

Note the scope of the original user gesture proposal was ambiguous. It looks longer for people to narrow down the scope. The current GH proposal came after the exclusion deadline, so we are also not sure if the “partial support” is still viable. @jgraham

jgraham commented 1 year ago

Well we are just over a week from the deadline for making a final decision on which protocol areas to adopt, and more than a week beyond the proposed deadline for making the proposals detailed enough that they could be clearly assessed.

However given there's real pain here, it seems to me like it would be worthwhile to consider a clearly scoped proposal, even at this stage. But for others it may already be at the point where there isn't time to reassess the proposal in light of any changes.

It seems obvious to me that this can't end up requiring support for APIs that happen to require activation which UAs are otherwise unable to implement. So I think we'd need to see a clear set of tests which cover the points of difference between browsers but doesn't depend on features that aren't universially implemented.

mustaqahmed commented 1 year ago

Based on the comments above, I am splitting "Bucket 1" (from in my previous comment) into two sub-buckets as follows, and proposing to target Bucket 1.1 for 2023:

Bucket 1: specs that properly reference the HTML activation concept and are on a standards track

Bucket 1.1: already supported by major browsers

Clipboard API (transient non-consuming)
Web Audio API (sticky activation)

Bucket 1.2: not yet supported by major browsers

Fullscreen API: mentions activation requirement, a PR to consume got consensus!
Payment Request API (consuming)
Vibration API (sticky activation)
Web Share API
Web XR Device API

(I moved Fullscreen to Bucket 1.2 because Safari doesn't support consumption yet.)

EdgarChen commented 1 year ago

Sounds good, thanks for your responses. :)

marcoscaceres commented 1 year ago

The above seems reasonable... with the hope that at least Fullscreen will also get included (it's hopefully small change, so I'd encourage us to add it).

With Payment Request, at least Chrome and Safari should be fully interoperable and already doing the right thing per spec.

And even though Web Share is not available across all platforms, it should be broadly interoperable with respect to user activation (for where it is available).

gecko
webkit
... maybe someone can find the one in Chromium?

mustaqahmed commented 1 year ago

To complete @marcoscaceres's list above, here is Blink's Web Share code.

At this point, we can broaden our 2023 Interop goal above (i.e. Bucket 1.1) to: A. add all three of Fullscreen, Payment and Web Share, assuming new implementations are easy, or B. add just Web Share so that cross-browser implementation of the other two API doesn't become a blocker.

Please vote for 🅰️ or 🅱️ so that we can quickly decide.

marcoscaceres commented 1 year ago

🅰️ from me.

jgraham commented 1 year ago

We were working on the basis of the "Bucket 1.1" scope, and don't think the other work is the same level of priority.

Also, as a process point, this is extremely late to be changing the scope, given that people are already firming up positions, and trying to broaden the scope at this stage will invalidate that work, and thus is unlikely to be well received.

marcoscaceres commented 1 year ago

@jgraham, Happy to push this to 2024, as the API listed are broadly interoperable anyway.

I guess in parallel, we can all come up with a broader plan interop plan by expanding on @mustaqahmed's list (https://github.com/web-platform-tests/interop/issues/142#issuecomment-1309048359).

I'd really like to cross reference that with the larger list of APIs that depend on some kind of user gesture.

nairnandu commented 1 year ago

Thank you for proposing user gestures for inclusion in Interop 2023.

We wanted to let you know that this proposal was not selected to be part of Interop this year. We had many strong proposals, and could not accept them all. As discussed in the issue comments, it was hard to find a subset of this proposal that was itself an interop priority and only depends on features that are themselves widely implemented.

For an overview of our process, see the proposal selection summary. Thank you again for contributing to Interop 2023!

Posted on behalf of the Interop team.

marcoscaceres commented 1 year ago

We should definitely see if we can get this into 2024. Hopefully we can just reopen the issue when the time is right?

marcoscaceres commented 1 year ago

(In other news, the User Activation API is enabled in Safari Tech Preview… so, that’s something 😊)

foolip commented 1 year ago

We should definitely see if we can get this into 2024. Hopefully we can just reopen the issue when the time is right?

That would be great. We haven't defined the process for next time yet, but I expect a new issue will be clearer. It would need to explain what's changed since last time.

AshleyScirra commented 9 months ago

Can I request that this proposal is resubmitted for Interop 2024? I could file a new issue but it seems valuable to preserve the existing discussion. (I don't appear to have permission to reopen this issue myself.)

foolip commented 9 months ago

@AshleyScirra please file a new issue and link to this one. I'd suggest naming it user activation to match spec terminology.

AshleyScirra commented 9 months ago

Re-submitted for 2024 at #428.

web-platform-tests / interop

User gestures #142

Description

Rationale

Specification

Tests

Bucket 1: specs that properly reference the HTML activation concept and are on a standards track

Bucket 2: specs that properly reference the HTML activation concept and are not on a standards track

Bucket 3: specs that need to be edited to properly reference the HTML activation concept

Bucket 1: specs that properly reference the HTML activation concept and are on a standards track

Bucket 1.1: already supported by major browsers

Bucket 1.2: not yet supported by major browsers