w3c / webdriver-bidi

Bidirectional WebDriver protocol for browser automation
https://w3c.github.io/webdriver-bidi/
336 stars 35 forks source link

feat: support calling WebDriver Classic commands through WebDriver Bidi bridge #701

Closed christian-bromann closed 2 months ago

christian-bromann commented 2 months ago

Hey,

with Bidi and its support for managing different contexts more easily it may be worth considering building a bridge between WebDriver Bidi and Classic allowing developers to call Classic commands via Bidi. I could imagine the following, we could introduce a webdriver module that has a execute command, e.g.:

webdriver.Execute = (
  method: "webdriver.execute",
  params: webdriver.ExecuteParameters
)

webdriver.ExecuteParameters = {
  url: text
  context: browsingContext.BrowsingContext
  ? payload?: text / null
}

text may be the stringified body payload for the command. I don't have any strong opinions on how this should look like in detail. This is intentionally kept simple.

This could enable us to simplify user interactions across multiple browsing contexts. Imagine a page with 3 nested iframes (e.g. root > iframeA > iframeB > iframeC) , to interact with elements in iframeC a user would need to call a set of commands to identify all iframes and switch into them accordingly. Then to continue on the root page, the user would have to switch back. This has been known to be a tedious and error prone process. Framework authors like me could make this very easy and essentially remove the need to care about switching contexts completely.

Technically I could already switch to e.g. a certain context with browsingContext.activate and execute a classic command, however this scenario would only work if I run one command at time. I would like to be able to do this in parallel though.

EDIT: it seems like browsingContext.activate only works for top level contexts, I probably missing something here then as I am not sure how the context id of an iframe helps me then. While I can locate elements via browsingContext.locateNodes in those iframes, it couldn't further interact with them.

This has been discussed before in #546 and I wonder if we may know more about this type of limitation.

OrKoN commented 2 months ago

While I can locate elements via browsingContext.locateNodes in those iframes, it couldn't further interact with them.

Could you clarify this part? if you locate them using browsingContext.locateNodes you can pass results to script.* API, screenshots and other APIs of WebDriver BiDi.

jgraham commented 2 months ago

Conceptually this doesn't really work, or at least what could work doesn't seem very interesting.

In general the stuff you can do in classic should also be possible in BiDi (possibly with some extra effort). To the extent that it isn't, that's a missing feature and we should figure out a plan to fix the gap. However in the meantime one can use both classic and BiDi in the same session, with the limitation that one has to talk to classic using HTTP.

We could make it possible to send classic commands over websockets, but it would come with all the same limitations of classic: one context at a time, one command at a time.

That's because it isn't simple to just do away with the global shared state of classic. The model of having one running command at a time, talking to one context, isn't really a limitation of the wire protocol, it's a fundamental part of the specification design. It also shows up in implementations. For example in gecko a lot of the spec-level shared state (e.g. the current browsing context) looks like shared state in the code. Trying to run multiple commands in parallel would lead to buggy and unpredictable behaviour.

So I think the best one could achieve here would be a BiDi module reflecting classic commands, where commands are queued to run one at a time, and with the possibility of an implicit switch-to-window/frame to set the right browsing context before running each command. That's not nothing, but it also doesn't obviously seem worth prioritising over expanding the feature set of BiDi.

christian-bromann commented 2 months ago

Could you clarify this part?

@OrKoN that is correct, WebDriver Bidi specific APIs work.

@jgraham thanks for the clarification. I understand from this that it would be possible if every WebDriver Classic command would be implemented again via Bidi with a multi-context model in mind. But since browser have implemented WebDriver classic with a global shared state, it can't just be adopted to that.

While on the one side it would be nice to provide this flexibility , I think iframes are more an exception these days and scenarios where one would like to do parallel operations on multiple browsing context are rare, if existing at all. Therefor closing this. Thanks for your input.

whimboo commented 2 months ago

Also please note that not all browsers implement classic directly in the browser itself or the driver. Like for Firefox we have the WebDriver protocol implementation in geckodriver, while BiDi purely runs within Firefox. That means we could even not run any classic command via a WebSocket connection because Marionette uses a custom protocol.