w3c / webdriver

Remote control interface that enables introspection and control of user agents.
https://w3c.github.io/webdriver/
Other
679 stars 194 forks source link

Exposing the PID of the browser process from the remote end to the local end - in Capabilities? #1823

Open spectranaut opened 3 months ago

spectranaut commented 3 months ago

Hi!

We'd love to be able to get the main PID of the browser process back from WebDriver implementations (like chromedriver) in order to test the exposed accessibility API on the browser. We need the PID in order to reliably find the the accessibility API of the browser through the platform's accessibility API (otherwise, we would have to find the application by name, which is tricky if there are more than one Firefoxes open). We only need the main PID of the browser in all cases.

We'd like to get this information from WebDriver implementations like chromedriver specifically because it would allow us to use WebDriver in our test suite. We would use WebDriver to start the browser, load test pages, and interact with the browser (like clicking buttons or running scripts), then separately query the browser through the accessibility API.

For some context: accessibility APIs are what assistive technologies like screen readers use to interact with applications like web browsers. We are working on extending WPT with tests that test the html -> accessibility API mappings. If you are curious to understand more, you can read this explainer from my colleague @alice: https://github.com/Igalia/wpt/blob/explainer/docs/wpt-for-aams.md

I'm new to reading the WebDriver spec, but it seems to me that a process ID can be returned in the Capabilities object returned when starting a new Session. This might not makes sense to do when using WebDriver over a network. In our case, we can only query the accessibility API of a browser on the same local machine.

So my questions:

Looking at the code for the chromedriver implementation, I think this won't be hard to do, and I would be happy to supply a patch if this change is supported! :)

spectranaut commented 3 months ago

Oh, I should add, this is different work than the get computed role and get computed name work -- the difference is best explained through the explainer I linked above: https://github.com/Igalia/wpt/blob/explainer/docs/wpt-for-aams.md

whimboo commented 3 months ago

Might be a good enhancement. Note that for Firefox we already return the PID of the parent process as moz:processID in the returned capabilities.

@sadym-chromium and @gsnedders would that work and could you consider such a capability as well?

OrKoN commented 3 months ago

There could be security concerns with exposing process ID (not an expert here) since WebDriver endpoints are available over network. But I think there is an universal workaround that is only available if you have local access to the process: for each launched browser instance one could generate a unique ID adn provide it as an argument (via {browser}Options?) to the binary, then find the process that contains that ID. So for example, chrome --myRun123 and ps -A | grep myRun123.

gsnedders commented 2 months ago

My main question here is "what is reasonable behaviour when these are running on separate devices", especially if there are intermediate nodes involved.

And, like, getting a PID for iOS (and I believe processes can get their own PID via API?) just isn't very useful — I don't know how useful it is on Android, but certainly on iOS you almost certainly can't really do much with that, given the sandboxing model.

It's also only giving a single PID, which may or may not be that useful depending on browser architecture design, and whether that PID is actually useful. One can imagine cases where you might want to know the PIDs of other processes (be them web content processes, or networking processes, etc.) too.

whimboo commented 2 months ago

It's also only giving a single PID, which may or may not be that useful depending on browser architecture design, and whether that PID is actually useful. One can imagine cases where you might want to know the PIDs of other processes (be them web content processes, or networking processes, etc.) too.

Note that there is an open issue on GitHub (https://github.com/w3c/webdriver-bidi/issues/397) for obtaining details about all processes by using WebDriver BiDi. For WebDriver classic, we could implement a minimum solution by just returning the main PID as a capability.

The primary reason why we return it with Firefox is to facilitate restarts of the browser for Marionette-specific tests. Without an easy way to get the main PID, tracking the browser process after it forked itself would be significantly more difficult.

alice commented 2 months ago

what is reasonable behaviour when these are running on separate devices

I had imagined that this API would only be available when directly communicating with a browser on localhost, or at least when the client, all the intermediate nodes and the browser are all on localhost.

getting a PID for iOS just isn't very useful

I'm not sure the type of test we want to write is possible at all on iOS (I believe only the OS can access accessibility information), so for us the question there is moot, anyway.

I don't know how useful it is on Android

Yeah, I think for our purposes we'd have to find another mechanism for finding the application on mobile OSes. For Android, some cursory research suggests you can do some things with a PID, but it's not trivial to match it to an AccessibilityWindowInfo/AccessibilityNodeInfo which I think is what we'd need.

It's also only giving a single PID

So far, it seems like the main PID is reliably correlated with the accessibility tree for the application, so it works for our purposes.

spectranaut commented 2 months ago

Thanks for the feedback and questions, everyone!

It seems like there is some interest and no blockers for returning the main process ID in the Capabilities -- when the browser is on the same machine, not a separate device/over the network. What are the next steps? Should I open a PR to continue the discussion?

@whimboo Just to be sure -- moz:processID is only returned when the browser is on the same machine? In that case this proposed feature will be the same as Marionette's current implementation.

whimboo commented 2 months ago

@whimboo Just to be sure -- moz:processID is only returned when the browser is on the same machine? In that case this proposed feature will be the same as Marionette's current implementation.

Firefox itself including geckodriver cannot determine if the client is run on the same machine or not. I think that removing the process id from the capabilities should be more a feature of an intermediate node which actually acts as a proxy and forwards the commands and responses.

Feel free to open a PR with the suggested changes. Then we can indeed continue to discuss this topic based on the actual proposal. Thanks.

gsnedders commented 2 months ago

I think that removing the process id from the capabilities should be more a feature of an intermediate node which actually acts as a proxy and forwards the commands and responses.

That doesn't work in the case of safaridriver which can directly control a remote device — though in that case the endpoint node knows that it is connected to a remote device.

spectranaut commented 2 months ago

Ok, I opened a PR: https://github.com/w3c/webdriver/pull/1833