Open annevk opened 7 years ago
Is it important that everyone that uses web-platform-tests also gets test262 as part of it, or would it suffice if the tests are run on the same setup as for wpt.fyi and published either on wpt.fyi or a test262 results dashboard?
As I understand it test262 attempts to be host-agnostic, just like ECMAScript itself. So while the web platform has many agents, other hosts might just have one. So if we want to run those tests in a window, a worker, a shared worker, or a combination thereof (in case of SharedArrayBuffer
), etc. I think that has to happen on the web-platform-tests side.
Various JavaScript engines can also run test262 directly, but that doesn't exercise quite the same code paths as running them through web platform agents.
Oh, so you're saying we'd run the tests at least in a window and worker context?
@foolip ideally all agents, including worklets (though only possible for audio worklets I think), service workers, and shared workers. That's the long term goal.
The short term goal is making sure SharedArrayBuffer
tests are tested across all agent combinations, which similarly requires this kind of wrapper setup.
Perhaps reading https://gist.github.com/annevk/b15a0a9522d65c98b28fb8c6da9f0ae5 helps.
Thanks, that does help. Seems like a good start would be to pick a browser, write a wrapper for wpt, and run the test against the similar-origin window agent using wpt run. See if there are any differences to the results from the same tests run against the engine JS directly. Then also run against the other agents and see what other differences show up.
Most likely, new bugs will be revealed. Depending on how many bugs, the tradeoff between running the same tests many times vs. finding bugs might look different.
How long does it currently take to run all of the tests?
Hi @foolip . I coded an attempt to do what you suggested at https://github.com/dpino/gecko-dev/pull/2/commits/9641de06bae7bab0039223d2fd010e42c24ccb30
Basically it's a Perl script that prints out a WPT test with a customized list of test262's tests to run. In that commit I'm only supporting a DedicatedWorker, although it could be extended for other types of workers. The main issue with this approach was that it required to write wrappers for things that test262 uses (for instance the assert commands are slightly different than what WPT supports) and more importantly when I tried to build up a long list of tests to run, the whole test timeout.
I think this approach, although it can be interesting to try out test262's in a browser, it does not sound like the right approach, dunno.
Unfortunately the laptop I was using to do this work crashed today so I cannot check how long time takes to run the whole test262 or wpt suite. I will post those numbers once I my laptop gets fixed (luckily in one day or two).
I think this makes sense. I think WASM might do something similar. The details of how the integration should work are unclear to me; how will web-platform-tests be kept in sync with the test262 tests?
@foolip Running all test262 in my laptop takes around 4 min. Not a very useful information. I suppose you were asking for the time spent running the all tests as part of a CI infrastructure or similar.
$ ./tests/jstests.py build_OPT.OBJ/dist/bin/js test262
[27706| 0| 0| 1041] 100% ======================================>| 233.0s
PASS
I gave a try at your suggestion (a wrapper that relies on wpt run
to launch a test262 in the browser). I pushed the changes to a remote branch at: https://github.com/dpino/web-platform-tests/tree/test262-runner
I have several questions regarding web-platform-tests. Ideally the way I think the test262 suite should be run is by opening a browser and run all the tests in the same instantiated browser. With the wrapper above, each launch of a test opens/closes a new browser, therefore taking a very long time to run the whole suite (even more since the suite should run on different agents). Another approach could be to group several 262 tests together into a single WPT. I don't know if it would be possible to have one single instance of a browser where every test is run and the browser communicates back the results to the command shell.
web-platform-tests generally work with one instance of the browser running multiple tests.
The most obvious way to do this integration would be to generate testharness.js wrappers for the test262 tests and check in the generated files. These would then run like any other testharness.js test. It looks like that's more or less what's on your branch, but you don't add all the files at once, and call wpt run
for every test rather than once.
There are more complex solutions we could imagine in which the templates are baked into the server like with .worker.js
files. I don't know if that's worthwhile.
Thanks @jgraham for the clarification. Initially I thought web-platforms-test launched a new browser per test, but I was wrong.
I've updated the script quite a bit. Now I just use the script to generate the WPT wrappers from test262 test files and run them external as normal WPT tests.
OTOH, some of the tests were failing or timeout. The issue was that some 262 tests modify builtin objects such as Array and that had a collateral effect on the web-platform-test harnessing code. So I actually need to parse the source of the test and add code to undo the change once the test is over. Anyway, still struggling with this.
Perhaps an alternative approach is to load the 262 test in an <iframe>
and then use onload
to inspect the result? Might not be as nice though and come to think of it would not work in a worker and such. Seems those kind of tests would be rather hard to do properly with a harness.
@annevk I can give a try to run the test in an iframe, at least for same-origin window, and see if I got more tests passing. Right now launching test262/builtins directory, which is the largest 262 directory, I got 1000 tests failing and 35 timeouts. Maybe some tests fail due to a missing JS shell feature in the browser (not all of them are implemented yet) or so. I would need to look more into the failing tests.
The good thing of running the test in the browser as web-platform-tests is reusing all the infrastructure for running tests and retrieving reports. But everything that has to do with instrumentation (Sellenium/Marionette) is actually not useful for this case IMHO. @jugglinmike told me about https://github.com/bterlson/test262-harness that is a node.js tool for running test262 in the browser (there's also https://github.com/bakkot/test262-web-runner). So maybe a similar tool that uses a web-socket to communicate the results from the browser to a server process could be another approach. I don't know. Does it make sense? For the moment, I'm going to keep trying this approach.
I'm not sure, I'm not familiar enough with all the harnesses. I'm curious if @bakkot has looked into running test262 in a worker environment.
I have first version of the tests running. I reworked the script to run the tests inside an IFrame. Then I added support for other agents: child Window, DedicatedWorker and SharedWorker. ServiceWorker is not supported yet, more on that later.
I used the results of Test262-Web-Runner as a baseline to compare the results I got. I run the tests on Firefox Nightly 59.a1. First of all, here are the results for Test262-Web-Runner:
Test262-Web-Runner
Test | Ran | Failed |
---|---|---|
annexB | 977/1003 | 26 |
built-ins | 12743/13446 (skipped 32) | 703 + 32 |
harness | 94/94 | 0 |
intl402 | 231/236 | 5 |
language | 13917/14822 | 905 |
And here are the results of the web-platform-tests's wrappers for test262 (only IFrame in this benchmark):
Test | Ran | Expected results | Failed |
---|---|---|---|
annexB | Ran 2263 tests (1003 parents, 1260 subtests) | 2230 | 33 (FAIL: 33) |
built-ins | Ran 40188 tests (13478 parents, 26710 subtests) | 38748 | 1440 (FAIL: 1440) |
harness | Ran 275 tests (94 parents, 181 subtests) | 275 | 0 |
intl402 | Ran 708 tests (236 parents, 472 subtests) | 698 | 10 (FAIL: 10) |
language | Ran 43243 tests (14898 parents, 28345 subtests) | 41559 | 1684 (FAIL: 1684) |
This summary cannot be compared directly with the results of Test262-Web-Runner. By default, 262's tests are executed both in strict mode and non-strict mode, unless a tag (onlyStrict, noStrict) indicates otherwise. For each test in the WPT wrapper, two actual tests are run normally. So when a test fails likely that counts as two failing tests. On the other hand, a test that fails in Test2-Web-Runner counts only once.
So to actually compare the WPT results and Test262-Web-Runner I need to normalize the results using an expression like the following:
$ grep "FAIL IFrame" annexB.output | cut -d : -f 2 | sort -u | wc -l
Here are the normalized results for IFrame:
Test | Ran | Failed |
---|---|---|
annexB | 1003 | 26 |
built-ins | 13446 | 720 |
harness | 94 | 0 |
intl402 | 236 | 5 |
language | 14822 | 827 |
The results are almost the same as Test262-Web-Runner (I just noticed the results for 'language' are much worse, although I used to get better results in other runs. I will look into that **) . Then I started to add support for the other agents. I paste the results for each type of agent:
** 08/01/2018: The values are updated now.
Window
Test | Ran | Failed |
---|---|---|
annexB | 1003 | 26 |
built-ins | 13446 | 720 |
harness | 94 | 0 |
intl402 | 236 | 5 |
language | 14822 | 833 |
Worker
Test | Ran | Failed |
---|---|---|
annexB | 1003 | 69 |
built-ins | 13446 | 1043 |
harness | 94 | 1 |
intl402 | 236 | 6 |
language | 14822 | 3827 |
SharedWorker
Test | Ran | Failed |
---|---|---|
annexB | 1003 | 69 |
built-ins | 13446 | 1059 |
harness | 94 | 2 |
intl402 | 236 | 6 |
language | 14822 | 3907 |
Regarding ServiceWorker, the reason I left it out for the moment is that for the currently supported agents I generate the tests on-the-fly (either an HTML page for IFrame and Window or a JavaScript file for DedicatedWorker and SharedWorker) using a Blob object. However, it's not possible to generate ServiceWorkers on-the-fly for security reasons. One possible work around would be to generate the ServiceWorker files for each test beforehand. The con is that that would duplicate the number of total files but I think it would work.
I fixed the issue that affected the results of the 'language' block test. The values are updated now.
I have pushed a PR with the script to generate the WPT wrappers as well as the harnessing code to run the tests. The PR is not ready to be merged yet, but I think it can be a starting point to get feedback and discuss what's pending to be done. PTAL https://github.com/w3c/web-platform-tests/pull/8980
Over in https://github.com/dpino/gecko-dev @dpino is working on ensuring that various
SharedArrayBuffer
tests from test262 run across the various agents defined by the web platform. A follow-up goal is to run all test262 tests across the various agents to ensure there's no weird bugs in the various JavaScript engines.The idea is to host this "wrapper test suite" in web-platform-tests so all user agents can benefit.
If anyone has thoughts, ideas, or concerns that'd be great to hear.
cc @jgraham @domenic @foolip @ljharb @leobalter
(Corresponding issue: https://github.com/dpino/gecko-dev/issues/21.)