w3c / webdriver-bidi

Bidirectional WebDriver protocol for browser automation
https://w3c.github.io/webdriver-bidi/
363 stars 40 forks source link

Reading the W3C spec I don't see the Websockets connections being multiplexed; any reason why not? #670

Open shehackedyou opened 7 months ago

shehackedyou commented 7 months ago

I have been reviewing: https://w3c.github.io/webdriver-bidi/#transport

And I see that each it is one WS connection per Session. Are you multiplexing each of the tabs? Could you not multiplex each of the sessions so that one is only dealing with a single socket, and obtain a little bit more efficiency over the long term?

But I've just started scratching the surface of this topic, so if I'm incorrect (and I probably am) I would appreciate the details why.


| __In_addition,_I_have_another_question_for_contributors__|

I'm really interested in his protocol; I'm curious to take the temperature of Webdriver contributors on how these tools are actively being filtered by key websites making them very difficult to use.

So do you feel that concealment is within the scope of the project? Or do you feel this is beyond the scope? I'm working on explicitly concealment right now, but if its outside the scope I can simply just not issue pull requests on that sort of functionality.

I was considering adding a transparent proxy enabling concealment, and detection of expectations required for proper concealment. Function as a mimic by altering outgoing traffic to match a browser being operated by a randomized within set of potentially realistic appearing users, and fingerprint.

This could also be used to serve specific common JS, and CSS to function as a local CDN and ideally reduce the number of required outgoing connections (but maybe that actually is meaningless with the browser's ability to cache it).

But it could explicitly block certain connections, modify outgoing traffic or monitor it to determine expectations to achieve concealment.

Seems like the least invasive way to obtain essentially the maximum flexibility (in the past chromedriver was modified as 'undetected-chromedriver' but it was via binary patching limiting the effectiveness of a solution that already has limited effectiveness).

My first thought was to patch the browser (but that's an awful 'solution'), this would be separate than the MITM proxy between webdriver and the browser described in the documention.

This would be a separate MITM transparent proxy, wrapping around the browser HTTP/WS/etc requests. In my opinion, a transparent MITM proxy for concealment would work well as a component of Webdriver. And I understand that it has built in proxy functionality, but ideally we want ingoing/outgoing traffic to have their data types modeled and useful associated methods and functions provided to make it accessible.

jimevans commented 7 months ago

And I see that each it is one WS connection per Session. Are you multiplexing each of the tabs? Could you not multiplex each of the sessions so that one is only dealing with a single socket, and obtain a little bit more efficiency over the long term?

I'm not sure I fully understand the question. There's nothing preventing a WebDriver BiDi session from controlling multiple tabs, which would require only one socket. Yes, you need to explicitly create the new tabs (see the browsingContext.create command), but you don't need multiple sessions to control multiple tabs.

So do you feel that concealment is within the scope of the project? Or do you feel this is beyond the scope? I'm working on explicitly concealment right now, but if its outside the scope I can simply just not issue pull requests on that sort of functionality.

I can in no way speak for anyone in the group besides myself, but I believe that concealment of automated browsing activity from the site being automated is probably beyond the scope of this spec. Given that the protocol is expressly designed for testing of pages, and not general-purpose browser automation, I don't know that I'm interested in concealing that activity from website publishers. Enabling such behavior may present a security threat for some websites.

This is especially a gray area for me as the Terms of Service of many (most?) commercial web sites have a clause somewhere that says, in effect, "You can't use automated means to access this website." I am not willing to engage in a discussion as to whether such ToS clauses are "right" or "should or shouldn't exist" or whether it's "ethical" to circumvent them. But I, for one, am not entirely comfortable enabling concealment as popular as that design goal is among certain sets of users.

mikestopcontinues commented 7 months ago

I believe that concealment of automated browsing activity from the site being automated is probably beyond the scope of this spec.

I understand this position, but it should at least be within spec not to provide clues as to where control is coming from. That seems like the purely neutral position.

Unless it's possible for users to avoid detection using this API with more code than they currently employ, it will cause a huge amount of waste in the industry. Many libraries won't be able to migrate, and forks of browsers may need to be created.

I think the best position is to facilitate the maximum adoption of the API. Leave users to "humanize" their gestures, leave companies to manage their TOSs, and leave governments to enforce the law.

shs96c commented 7 months ago

I understand this position, but it should at least be within spec not to provide clues as to where control is coming from. That seems like the purely neutral position.

That is not a neutral position given the difficulty of detecting whether a browser is automated or not, and given the complexity of building and maintaining fingerprinting technology. The difficulty of detecting whether a browser is automated is one of the reasons that the Working Group agreed on the webdriver-active flag in the original WebDriver spec.

As @jimevans points out, for the use cases this spec is designed for, there is no requirement to facilitate concealing automation. For the use cases where a company may wish to use an automated browser on their own site (eg. to implement liveness checks), it is far simpler and safer to the wider Web to enable a signal implemented by the person writing the check that the automated browser is "friendly"

mikestopcontinues commented 7 months ago

I know this can become a heated topic, so I promise this is the last I'll contribute:

This reasoning assumes automation is unethical, and thus that the landscape needs to be corrected in favor of certain outcomes. A neutral position would not make it easy or hard to spoof human behavior. It would simply accomplish a task—transmit events to and from the browser identical to those possible by that of a human being.

Anyway, thank you for allowing me to contribute my view.