Closed Ms2ger closed 7 years ago
In more detail, the goal of this project is to implement a "metacircular" webdriver client. This will have an API implemented in javascript that runs into the browser. The API will use a HTTP or websockets backend to call a WebDriver client implemented in python in the wptserve server. This Python code will in turn call back into the browser using the WebDriver API, enabling human-like interactions to happen from tests written purely in cross-browser javascript.
For example a developer might run webdriver.click(element).then(function() {assert_false(element.disabled)}
. The click
call would result in a request containing some selector that could be used to locate the element, allowing the server to send a Find Element(selector)
call followed by a Click(element)
call. It would then return a response, causing the promise to resolve.
Are there any examples of us doing something like this? Maybe pieces of code that go one way, but not fully circular?
Is there a specified set of tests which should be automated as a result of implementing this?
@jmaher I have previously done prototypes that do similar, creative things with WebDriver.
What makes this “circular” is that wptrunner already manages the browser process and a session to it, in order to run the testharness.js-based tests. The proposal is to make re-purpose this session from an in-test JS client so that content can issue i.e. trusted events on itself, and then make the assertions using content querying.
We hope to apply this method to tests that are currently impossible to automate, e.g. manual tests. It also opens up for different test scenarios that previously one may have avoided writing tests for. Other examples of potent use might be security dialogues and permissions APIs. We had talks with the WebBluetooth WG at TPAC in Sapporo about the feasibility of testing that.
One note of concern with this is that WebDriver officially provides a blocking API. wptrunner currently uses the Execute Async Script command to wait for the test harness to hand it back the results once testharness.js finishes.
Because of this, any calls from the in-content JS client would be queued inside WebDriver and not execute until the Execute Async Script call finishes, causing a race condition. From an implementation perspective, Marionette (“GeckoDriver”) is designed in such a way that it does not block like this and expects the client to maintain ordering, so as far as I can see there’s no technical reason it’s not possible to accept more than one HTTP request at a time.
We wouldn't use Execute Async Script for running these tests, I expect.
FYI: WebRTC Working group is also interested in testing security dialog and permission APIs using web drivers.
On Fri, Apr 22, 2016 at 1:15 AM, jgraham notifications@github.com wrote:
We wouldn't use Execute Async Script for running these tests, I expect.
— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/w3c/web-platform-tests/issues/2161#issuecomment-213023306
Principal Architect - Citrix, San Francisco
sg.linkedin.com/agouaillard
-
I'm interested in this, specifically to run some of the touch-action tests in the pointerevents suite (which are mostly manual right now) in automation for Gecko. I have no experience with WebDriver at all, so I spent some time today poking around. As I understand it the "Actions API" of the WebDriver spec allows for this functionality. Is that different from what's proposed in this issue?
Also my understanding now is that for Gecko, the webdriver "server" component is implemented by the geckodriver project, which in turn acts as a marionette client and passes the webdriver commands on to the marionette server packaged with firefox. Unfortunately it also looks like the webdriver-rust project, which geckodriver builds on, completely omits the Actions API, so if I want this to work I'd have to implement those pieces in webdriver-rust/geckodriver as well as hook it up to new marionette code. Can anybody confirm or correct me if I'm wrong?
You are correct in your observations about the general architecture.
Unfortunately the Actions API chapter in the specification is so misleading that no one has implemented it yet. @jgraham is working on replacing it with something more coherent. You can see his current progress in https://github.com/jgraham/webdriver-actions.
For story completeness, I should add that there exists an actions API implementation in Marionette, but this does not match neither Selenium’s, what is currently described in the spec, nor what is in James’ draft.
I also wrote https://sny.no/2016/07/bttup last week to summarise the WebDriver working groups’ meeting in medio July. You may find this interesting.
@staktrace I should also add to the above that since you seem to know a few things about the pointer events specification, it would be very useful to us if you had any input on the way WebDriver creates an abstraction over it.
/cc @NavidZ @tdresser @summerlw who are working on pointer events and automated input testing in blink. Our current (far from perfect) WPT automation scripts are here.
Also @scottgonzalez who has this WebDriver-based automation of the pointer events web-platform-tests.
@andreastt @jgraham I read through jgraham's webdriver-actions notes and the main concern I have is that the semantics document describes stuff in terms of trusted DOM events. In Firefox, user input (at least touch/mouse) that we receive from the OS first goes through the APZ code, and then dispatches the DOM event, assuming APZ doesn't consume it completely. So it's possible that a touch tap sequence (touchdown/touchup) is performed by the user but never actually triggers the relevant DOM events. Instead, that tap might get consumed by the APZ to stop a fling that was already in motion. My understanding is that Edge also behaves similarly in some cases. So what I would like to see is that the semantics of the actions be described in terms of platform/OS input, rather than trusted DOM events, otherwise we will not be able to use the actions API to simulate what user input would actually do.
Defining the semantics in terms of platform input also allows the individual browsers to use their normal event firing pattern for a given input. The document right now has a bunch of TODOs around this but seems to specify firing PointerEvent events specifically, rather than TouchEvent or MouseEvent events. In Gecko currently, PointerEvents are generated from the TouchEvent/MouseEvent instances and so attempting to fire a PointerEvent without the TouchEvent/MouseEvent would not be in line with what the browser would do normally, making the API behaviour not representative of the browser.
Assuming the above concerns are resolved, the way the API as a whole is structured seems fine to me. Allowing the test to specify the different inputs with pauses and dispatch them in bulk seems like a good idea and should be general enough to simulate a wide range of user behaviour.
For the moment I have automated the touch-action manual tests by wrapping them into a mochitest where I use DOMWindowUtils APIs to synthesize "native" inputs (which behave the way I described above - injecting user input at the OS level). This is the Firefox equivalent of the Blink/PEP code that @RByers linked to in the previous comment.
staktrace@ I believe that's exactly the concerned I raised at https://github.com/jgraham/webdriver-actions/issues/1. It seems like everyone agrees on what the semantics should be and the debate is just how to do the spec legalese to express those semantics in a way that's both interoperable and well defined. My instinct is not to worry too much about the spec wording at this stage - if we design the API well and have multiple high quality implementations, we'll eventually figure out how to write a decent spec ;-)
The concerns that @staktrace and @RByers have raised a entirely valid, and I think you are both describing exactly what we mean for the browsers to implement.
The challenge, as I say in https://github.com/jgraham/webdriver-actions/issues/1#issuecomment-243394313, is to describe this in such a manner that it is sufficiently high-level to define the exact ordering and which DOM events to expect so as to refer to well-defined platform concepts, without being too low-level and dwell on the mechanisms individual browser vendors would use to achieve that. E.g. the steps Blink would take are likely very different from those of Edge and Firefox.
The WebDriver specification explicitly states that the algorithms describe the expected output and not necessarily step-by-step what the implementation must do. For as long as the result of a command matches what the algorithm describes, the implementation is considered conformant.
To the point that @staktrace brings up on how this could be implemented in Gecko using DOMWindowUtils et al., that is in fact exactly how Marionette currently implements actions!
To the point that @staktrace brings up on how this could be implemented in Gecko using DOMWindowUtils et al., that is in fact exactly how Marionette currently implements actions!
Not quite - we actually have two sets of DOMWindowUtils APIs. There's one set with functions like sendTouchEvent, which will create the event and dispatch it into the DOM directly (and synchronously). There's another set with functions like sendNativeTouchEvent, which will inject the events asynchronously into the pipeline so that they're as indistinguishable from native OS events as possible. In fact, on many platforms it will actually call OS APIs to inject the simulated input into the OS event queue. The former set of APIs are what Marionette uses, the latter is what I would like it to use. The latter is also what my pointer events and APZ tests use.
There's another set with functions like sendNativeTouchEvent, which will inject the events asynchronously into the pipeline so that they're as indistinguishable from native OS events as possible.
OK.
We’re getting into Marionette specifics here, but for a lot of different reasons drivers have over a long period transitioned away from native events because they come with a lot of inherent problems.
I feel we should continue this discussion in an appropriate Mozilla forum, but I just want to say there’s nothing preventing us from exposing the set of native event functions from DOMWindowUtils as an opt-in, perhaps through a capability.
Just for reference - in Chrome there is chrome.debugger extensions API that exposes to Javascript the same API as used by Chromedriver via remote debugging protocol. It allows to dispatch native events from Javascript. Maybe Marionette will expose similar for unification?
Exactly that API I'm using in implementation of such "circular" Javascript Webdriver client that allows to run tests in the same browser instance: https://github.com/vitalets/autotester
I've imported several tests from Selenium JS binding and made them passing so it confirms @andreastt comment:
The WebDriver specification explicitly states that the algorithms describe the expected output and not necessarily step-by-step what the implementation must do
There's been much discussion about automation on the mailing list over the past few months. I'm going to try and summarise it here:
An API:
I think the big open question is, "what is behaviour loading one of these tests when not in automation"?
We can:
I'd say 1, personally.
Assuming it's 1/2, then:
1 seems fine, although I would encourage tests to specify the actions for cases where it can be performed manually. "Click on this button" seems pretty possible. It might be interesting to allow the user to complete the action if there's no server and have that resolve the promise (assuming a promise-based API), but it could also be hard to implement. It's not a requirement in my mind.
The goal here is to allow
wptrunner
to automatically run (some) tests that would ordinarily require user interaction, through the WebDriver protocol.