web-platform-tests / wpt

Test suites for Web platform specs — including WHATWG, W3C, and others
https://web-platform-tests.org/
Other
4.8k stars 2.99k forks source link

Extract standalone test data out from JavaScript #22988

Open ag-eitilt opened 4 years ago

ag-eitilt commented 4 years ago

Before I get into the body of the issue, I know that it's not going to be of any major importance; I'm mainly hoping to get some thought in its direction rather than expecting to result in any fast changes. That said, I'm tired and coming off of wrestling with the code, so I'm maybe not smoothing things over as much as I might otherwise.

WPT seems a great tool for evaluating how close a browser hews to the specs, and for figuring out where to focus for improvement. However, the reliance on JavaScript effectively restricts that to evaluating existing, rich browsers: there's no way to test Lynx, for example, simply because it doesn't support scripting; likewise and probably more relevant to developers, stripped-down reading modes (depending on how they're implemented), browsing contexts where scripting is disabled, and (as far as I know) most accessibility layers all have to rely on their parent browsers/views passing the tests, and can't do more than hope the thinner ones do as well. That cuts out a lot of user agents and methods of interaction that the HTML spec, at the least, makes a point of supporting.

Certainly a lot of that can't be helped (how can you programmatically check the results of any given rendering otherwise?) and a lot more falls back to JS by default (what exists of the geolocation interface after you rip out all the scripting?), but there are still quite a few places where that reliance can be reduced---and therefore the potential coverage increased. I'm mainly looking at low-level things like the encoding tree here, where I've just recently had to go through and either manually copy data over or write fragile extraction filters because everything's embedded in <script> tags, .js files, or, worse, remotely loaded into an <iframe> and checked from there.

At times, the design seems actively hostile to developing new engines. I might not be expecting my personal exercise to reach any great heights, but I'd still like it to be correct, and when I have to fight the official test suite every inch of the way just to get half of the tests working, I'm not filled with a whole lot of confidence or, for that matter, good will.

I'd like to propose that, where possible, the data portions of the tests be extracted into standalone, pure files which can then be loaded indirectly by the primary loader or easily parsed by independent test suites (the array in textdecoder-fatal.any.js, among others, is even already basically JSON); and that sample files to be embedded or called, are given canonical reference files in some relevant form (UTF-8 for encodings, the result of the serialization algorithm for tree-building, etc.) which can then be compared however may be best for the test driver rather than arbitrarily requiring support for assert_equals(window.frames[i].window.document.body.textContent, "\\uFEFF");

Yeah, the benefits may quickly peter out when the tests get into more complex behaviours and specs, but by that point the JS-tied tests make more sense anyway. When a full script engine is required just to bootstrap checking that a byte-order mark is decoded correctly, things have gotten over-designed and over-specialized. Even basic CSS can probably be distilled into a format which would allow arbitrary comparison (inline style=""? JSON [{color="black",content="This text should be "},{color="#008000",content="green"}]? An image if all else fails?). As it is now, the WPT may be better (if harshly) characterized as a means of comparing the big browsers than as a test suite for the specs.

guest271314 commented 4 years ago

However, the reliance on JavaScript effectively restricts that to evaluating existing, rich browsers: there's no way to test Lynx, for example, simply because it doesn't support scripting

Have you tried testing Lynx implementation of a specifcation?

Currently https://wpt.fyi/results/?label=experimental&label=master&aligned does not render the markup <wpt-app labels="[&quot;experimental&quot;,&quot;master&quot;]" aligned="" page="results"></wpt-app> and child element <table> at Lynx. That output could probably be adjusted to at least output and HTML version of results for text based browsers.

I'd like to propose that, where possible, the data portions of the tests be extracted into standalone, pure files which can then be loaded indirectly by the primary loader or easily parsed by independent test suites (the array in textdecoder-fatal.any.js, among others, is even already basically JSON); and that sample files to be embedded or called, are given canonical reference files in some relevant form (UTF-8 for encodings, the result of the serialization algorithm for tree-building, etc.) which can then be compared however may be best for the test driver rather than arbitrarily requiring support for assert_equals(window.frames[i].window.document.body.textContent, "\\uFEFF");

Have you composed a draft of the proposed implementation?

ag-eitilt commented 4 years ago

However, the reliance on JavaScript effectively restricts that to evaluating existing, rich browsers: there's no way to test Lynx, for example, simply because it doesn't support scripting

Have you tried testing Lynx implementation of a specifcation?

Currently https://wpt.fyi/results/?label=experimental&label=master&aligned does not render the markup <wpt-app labels="[&quot;experimental&quot;,&quot;master&quot;]" aligned="" page="results"></wpt-app> and child element <table> at Lynx. That output could probably be adjusted to at least output and HTML version of results for text based browsers.

Even if the output is tweaked to render correctly, a lot of text browsers, Lynx included, don't actually process the JavaScript required to generate the results -- loading up encoding/sniffing.html, say, in the Chromium-derived browser I primarily use displays a summary with harness status, the test successfully passing, etc., but loading it in Lynx, Elinks, or for that matter, Chromium with NoScript enabled, just displays the "Hello World!" test content itself surrounded by a variety of high-bit characters. Using Chromium as the model, I can guess that both the other two don't correctly implement the algorithm, but that requires manual visual comparison.

There's not really any way around that in-browser, and I wouldn't argue for one, but separating out the data would at least allow the developers to easily write their own runner if they were so inclined.

I'd like to propose that, where possible, the data portions of the tests be extracted into standalone, pure files which can then be loaded indirectly by the primary loader or easily parsed by independent test suites (the array in textdecoder-fatal.any.js, among others, is even already basically JSON); and that sample files to be embedded or called, are given canonical reference files in some relevant form (UTF-8 for encodings, the result of the serialization algorithm for tree-building, etc.) which can then be compared however may be best for the test driver rather than arbitrarily requiring support for assert_equals(window.frames[i].window.document.body.textContent, "\\uFEFF");

Have you composed a draft of the proposed implementation?

I've not yet, mostly because I wasn't sure how it would have been received -- I've been on the sidelines of other projects where something like that would be rejected out of hand. Since you seem to at least be entertaining the idea, I'll put something together, probably in the next week or so. (The trickiest part will likely be figuring out something that works for the majority of test suites, rather than just being tailored for a small number, so we'll see how long that takes.) Thanks!

guest271314 commented 4 years ago

Even if the output is tweaked to render correctly, a lot of text browsers, Lynx included, don't actually process the JavaScript required to generate the results

A <form> could be used at Lynx to at least run HTML and accesibility tests.

From perspective here, the test results should at least be accessible at text-based browsers, whether as plain text, JSON or HTML, etc.

but separating out the data would at least allow the developers to easily write their own runner if they were so inclined.

What is the "data" and "test data" that you are referring to? The test itself, to be run at a custom text-based browser with JavaScript enabled?

A custom version of Chromium could be built, or headless version used.

Am not entirely sure what you are trying to achieve.

I've not yet, mostly because I wasn't sure how it would have been received

In any case and any field of human activity: Roll your own. Make "it" yourself. "It" is your idea anyway. Manifest "it", without regard for how it is "received", or some arbitrary restriction someone or institution imposes upon themselves.

My confidence is up, I believe with all my soul I could do anything that I put my heart into I spend all of my time focused in the lab Comin' up with these songs, masterin' my craft 'Cause I'ma need that boat or one of them aircrafts God gave me a life and I'ma live this shit So, wassup? Don't get your head bust up You're in my way, young brother Get out my way, young brother Or I'ma run you down You should have accepted my offer Brush me off like it's nothin' I understand, no it's okay Run along now, you interrupting The illest rapper still active More dope, then all of my past shit I was just a boy, I was only 19 But by mind be on some old next shit

Tyranny, The Hegelian Dialectic (The Book of Revelation), Prodigy of Mobb Deep https://genius.com/Prodigy-of-mobb-deep-tyranny-lyrics

foolip commented 4 years ago

@ag-eitilt as you've found there are some places where the test data is already factored out in a way that makes it possible in principle to use in other contexts. /url/resources/urltestdata.json is another such case I'm aware of.

If there is test data that's close to being reusable and separating it out won't make things more convoluted, then people might be receptive to PRs doing just that.