w3c / webdriver

Remote control interface that enables introspection and control of user agents.
https://w3c.github.io/webdriver/
Other
685 stars 195 forks source link

Input events dispatch to top-level frame #1847

Open sadym-chromium opened 1 month ago

sadym-chromium commented 1 month ago

As discussed in https://github.com/w3c/webdriver-bidi/issues/795, the actions should be dispatched from the top-level browsing context.


Preview | Diff

sadym-chromium commented 1 month ago

@whimboo WDYT?

OrKoN commented 1 month ago

The following two things seem to be still missing:

1) steps to convert the coordinates to the top-level browsing context coordinate space if they are frame scoped. Edit: unless this is covered by the implementation-defined bits? 2) apply the same fix to other actions (I assume we do actually want to dispatch keyboard events through the top-level frame as well)

whimboo commented 1 month ago

The following two things seem to be still missing:

  1. steps to convert the coordinates to the top-level browsing context coordinate space if they are frame scoped. Edit: unless this is covered by the implementation-defined bits?

Yes, absolutely. By changing the way where we dispatch the events the coordinates should be exactly the same. Not taking care of offsets would cause quite a lot of regressions for those users who make use of actions a lot.

Therefore see:

https://github.com/web-platform-tests/wpt/pull/48147 https://github.com/web-platform-tests/wpt/pull/48123

Given that Chrome seems to already dispatch actions in the top-level browsing context the referenced tests are failing.

2. apply the same fix to other actions (I assume we do actually want to dispatch keyboard events through the top-level frame as well)

Yes, we would have to do it for all the pointer and wheel input sources.

Keyboard actions are more challenging because they could allow users to trigger shortcuts that access restricted browser features. This would require additional checks to determine what actions are permitted and which ones should be blocked. Maybe we should have a follow-up issue for it?

sadym-chromium commented 1 month ago
  1. apply the same fix to other actions (I assume we do actually want to dispatch keyboard events through the top-level frame as well)

Aren't all the actions end up in perform implementation-specific action dispatch steps?

sadym-chromium commented 1 month ago

@OrKoN @whimboo I added calculation of the offset relative to the parent browsing context, but IDK how well it will work

whimboo commented 1 month ago

@OrKoN @whimboo I added calculation of the offset relative to the parent browsing context, but IDK how well it will work

I think that we should have more wdspec tests (similar to mine for mouse) so we cover at least each input source (including sub sources) and verify implementations against. Did you get Chrome working for my example?

OrKoN commented 1 month ago

@sadym-chromium it looks like each action type dispatches on the context parameter https://w3c.github.io/webdriver/#dfn-dispatch-a-keydown-action

  1. apply the same fix to other actions (I assume we do actually want to dispatch keyboard events through the top-level frame as well)

Aren't all the actions end up in perform implementation-specific action dispatch steps?

that's right, overlooked that.

OrKoN commented 1 month ago

Keyboard actions are more challenging because they could allow users to trigger shortcuts that access restricted browser features. This would require additional checks to determine what actions are permitted and which ones should be blocked. Maybe we should have a follow-up issue for it?

so actually I am not sure, even with this change it is not quite defined what it means that the actions are dispatched to the top-level browsing context. For example, in Chrome that is the case but still would not give you access to browser shortcuts. Perhaps we need a better definition of what the dispatch of an event means, perhaps hook into native event handles in https://www.w3.org/TR/uievents/#handle-native-mouse-move-id?

whimboo commented 1 month ago

so actually I am not sure, even with this change it is not quite defined what it means that the actions are dispatched to the top-level browsing context. For example, in Chrome that is the case but still would not give you access to browser shortcuts.

Yes you are right. I mixed it up with the native event dispatching that I'm working on as well right now. In those cases we would have that particular issue but not when dispatching it in the content process of the top-level browsing context.