Open gsnedders opened 6 years ago
I think there's rough consensus across many of us that keeping the extra rules isn't workable longer term, because it stops us from doing large-scale cleanup of the CSS testsuite (most obviously, getting rid of the large number of duplicate files within it that are currently required all to be changed simultaneously), and I think 4 above is the right option (2 and 3 seem particularly unlikely, given despite the CSS WG caring significantly about the test harness nobody has been paid to address any of its issues in a long time).
In the short term, I think at the very least we should make test.csswg.org run the build system only on css/ (currently this causes two tests to disappear: html/dom/elements/global-attributes/style-01.html
and html/rendering/non-replaced-elements/the-fieldset-element-0/min-width-not-important.html
), which addresses 4 in the list of things we don't lint. (I think that requires @plinss to do that, however.)
We've been breaking the CSS Grid Layout test suite without noticing every now and then (see #9071 & #9089). Basically until someone manually catches the issue in https://test.csswg.org/.
There are some special things on the CSS tooling that one can easily oversight/forget/miss when writing/reviewing a new test:
.css
files.js
or .png
files for exampleThis causes that we have to duplicate files (which clearly is not good), or put them in common folders (which is bad is the file is specific to a spec) and use absolute paths (which is bad as you wouldn't be able to just open the test without running the server).
I agree with @gsnedders that the path to follow seems to be 4., making https://wpt.fyi/ (or other tools) better to cover the use cases of the CSS build system (such as running manual tests and store results). It'd be nice to be able to understand how people are using the CSS test harness today and extract all the requirements that are mandatory. With that information we could try to look for the best way to move things forward.
The key functionality that I've been relying on lately is the CSSWG test harness's ability to
As for the build system, I've mainly just used its indexes to help analyze spec coverage; and this sort of indexing system is easy to rebuild if necessary. In the past it was needed to convert XHTML<->HTML, since some CSS systems could only read XHTML files, but that seems to be less necessary now that the HTML parsing algorithm is well-specced and more widely implemented. At this point my main concern in this area would be people working with EPUB engines, and it might be worth talking to e.g. @dauwhe and his colleagues to see how they're using (or unable to use) our web platform tests.
The last concern I have is that the use of absolute paths requires the tests to be served up on a server, which makes WPT harder to use and develop locally. This doesn't affect folks who are comfortable with tooling much, but we have CSS test contributors who struggle with that. Hardly any of the CSS tests actually need a server, and so up until now it was possible to work with tests without setting up anything other than a browser and a text editor; but using absolute paths for basic resources like images or fonts changes that.
@gsnedders your solution #3 actually shouldn't be very hard. Removing the file format conversions and output restructuring from the build should be straightforward. The only thing the build would really need to produce would be the manifest files to feed the test harness. But getting the basic needs met by the normal wpt infrastructure would be my preference as well (#3 could be an interim step).
IIRC the reason we made the CSS build scan the entire repo was when the CSSOM tests got moved out of /css, looks like those are back in /css now (or never quite got moved).
@mrego you can use relative paths in the css tests or .css, .js, or images, they just can't go up ('../') and need to be either in the same directory or in a './support' directory. There's no difference between handling of .css, .js, or images. The only relative paths that get fixed up currently are for reference files.
Two other things the test harness does: 1) it orders the execution of tests to those most needed for the current browser (e.g. tests that need a first or second pass to exit CR and don't have one for the browser being used, there's actually more to the ordering but that's the intent) 2) it has an API to the results used by the CSS spec annotations, indicating both test coverage and pass status directly in the specs
Relative paths to the same directory didn't work for me (but support/
did), and needed to be fixed in #9912. (That's the sort of rule that's easy to mess up.)
The last concern I have is that the use of absolute paths requires the tests to be served up on a server, which makes WPT harder to use and develop locally. This doesn't affect folks who are comfortable with tooling much, but we have CSS test contributors who struggle with that. Hardly any of the CSS tests actually need a server, and so up until now it was possible to work with tests without setting up anything other than a browser and a text editor; but using absolute paths for basic resources like images or fonts changes that.
Yes, I was mentioning that, it's quite annoying. But that's needed as you cannot use ../
in the paths.
BTW, right now 1446 tests under css/
uses testharness.js
which is linked as an absolute path:
<script src="/resources/testharness.js"></script>
@mrego you can use relative paths in the css tests or .css, .js, or images, they just can't go up ('../') and need to be either in the same directory or in a './support' directory. There's no difference between handling of .css, .js, or images. The only relative paths that get fixed up currently are for reference files.
Yes, but if you decide to have a subfolder for spec section (like we have in grid layout, e.g. css/css-grid/grid-model/
), then you need to duplicate the support
directory and helper files in several places.
And yeah it seems there's no difference between .css
or other files. Dunno why we got confused about that or nobody caught the issue either, but we were doing this wrong for grid too. :disappointed: (see #10069).
And checking the repository I can find more relative paths, and different outputs in test.csswg.org:
harness cannot find the ../support/abspos-zero-width-001.png
file but shepherd can.
Also I see ../
used in several reference files, as those are usually view inside shepherd so they work there.
More suites (apart from CSS2) that contain ../
relative paths:
Also I found that css-fonts uses ../
in this file: http://w3c-test.org/css/css-fonts/support/css/font-variant-features.css, but in this case it works. :smile:
From @fantasai's https://github.com/w3c/web-platform-tests/issues/10053#issuecomment-373556697:
run manual tests (including manual reftest comparison--to work around antialiasing bugs) in most-needed order
I don't think we should be redefining what pass/fail mean for reftests (nor should we expect people to manually run them; there's no reason for people to manually run the majority of the testsuite, because that's just a waste of time). If tests are failing because of anti-aliasing, we should give "fail" as a result in the implementation report (because that's the result of the test), and then give an explanation as to why we believe the failure doesn't matter (i.e., it's an anti-aliasing bug we don't believe is a significant part of the feature). We should definitely be aiming for there to be sufficiently few manual tests that a harness is a marginal benefit.
The last concern I have is that the use of absolute paths requires the tests to be served up on a server, which makes WPT harder to use and develop locally.
Note we already require testharness.js
to be linked with an absolute path.
From @plinss's https://github.com/w3c/web-platform-tests/issues/10053#issuecomment-373582535:
your solution #3 actually shouldn't be very hard. Removing the file format conversions and output restructuring from the build should be straightforward. The only thing the build would really need to produce would be the manifest files to feed the test harness.
The hard part is modifying the test harness, as the test harness assumes references are referred to as "ref ids" all over the place, IIRC, and that you can just prefix "reference/" to them and have it work. Obviously not insurmountable, but it was hardcoded in enough places that I didn't want to try and change it when I looked at it before.
IIRC the reason we made the CSS build scan the entire repo was when the CSSOM tests got moved out of /css, looks like those are back in /css now (or never quite got moved).
Yes, that's my recollection, I think driven by @zcorpan wanting to use wptserve features that don't work within the test harness (and I think those tests are still broken within the test harness).
you can use relative paths in the css tests or .css, .js, or images, they just can't go up ('../') and need to be either in the same directory or in a './support' directory. There's no difference between handling of .css, .js, or images. The only relative paths that get fixed up currently are for reference files.
As @dbaron basically said, same directory doesn't work: they have to be in a ./support
directory.
As @mrego says:
Also I see ../ used in several reference files, as those are usually view inside shepherd so they work there.
References are put in a reference
directory alongside the tests, so the tests' support files are in ../support
so that will happen to work.
FWIW: I think manual running tests is mostly a distraction, because nobody is really going to run many of them, and we should work towards having as much as possible automated so we have vendors actually running them regularly (and yes, there's things that are unlikely to get automated, but they're few and far between).
Also I see ../ used in several reference files, as those are usually view inside shepherd so they work there.
Yeah, Shepherd works against the source repo, while the harness works against the built test suites, a bit of apples and oranges.
The hard part is modifying the test harness, as the test harness assumes references are referred to as "ref ids" all over the place, IIRC, and that you can just prefix "reference/" to them and have it work. Obviously not insurmountable, but it was hardcoded in enough places that I didn't want to try and change it when I looked at it before.
It's been a while since I've been in the harness code, but I think the key to solving that is to have the manifest files include the path to the reference file as part of the reference id, then just remove the "reference/" prefixes in the harness. (It'll have to use a full path as the test file id as well anyway since the tests won't be living in the root of the test suite either, which it also assumes.) Unfortunately that will mean that if a test gets moved within the repo the harness will consider it a different test and old results will get lost (the test equivalency code could possibly be leveraged to address that). We also have to lose the multiple test format feature of the harness (or at least disable it for current suites and keep it for the old ones).
FWIW: I think manual running tests is mostly a distraction, because nobody is really going to run many of them, and we should work towards having as much as possible automated so we have vendors actually running them regularly (and yes, there's things that are unlikely to get automated, but they're few and far between).
Unfortunately there are tests that no one has figured out how to test in an automated fashion yet. So until that gets solved, manual tests aren't a "distraction", they're a necessity. Yes, we all want as few of them as possible, but we can't simply ignore them.
run manual tests (including manual reftest comparison--to work around antialiasing bugs) in most-needed order
If reftests are failing but the feature is implemented correctly, we need to prioritize fixing that by some means. We certainly shouldn't sweep this under the carpet by manually overriding the results, because that's not a solution that vendors can reasonably use.
There are several possible approaches to improving things here; the most obvious is allowing specified fuzziness in reftests (N pixels different, maximum difference of X per channel).
The last concern I have is that the use of absolute paths requires the tests to be served up on a server, which makes WPT harder to use and develop locally.
Generally wpt is not expected to work without a server, and I object to having restrictions that are based on the desire to avoid a server in a subset of cases. There is already tooling to make running the server easier (e.g. wpt serve
) and I'm quite prepared to put more effort into making this even easier if people have specific use cases that don't work well. All tests should be able to use server-side features where possible, and should not run into issues because of the differences between the (unspecified, browser-specific) behaviour of file://
and other schemes.
Unfortunately there are tests that no one has figured out how to test in an automated fashion yet. So until that gets solved, manual tests aren't a "distraction", they're a necessity.
They are a distraction if we are building systems around the assumption that there will be a lot of manual tests. Requirements for features like crowdsourcing results and long-term results storage seems predicated on the idea that there are so many manual tests it's not reasonable for someone to sit down and run all the manual tests for a single spec in a session, generating a results file. If that is true, it's a problem for other reasons. If it's not true, then a simple client side harness that produces a json file is all the infrastructure required.
I think it's valuable to not require the server for the large numbers of tests that don't need it. This allows things like developing the tests elsewhere and then copying them into the wpt repo at the end, or other patterns that are often useful, such as allowing reftests developed for other reftest systems to be used as wpt reftests and vice-versa. It also sometimes allows debugging techniques or tools (especially if you need to debug tests on devices that aren't desktop/laptop computers) that are a good bit harder if the server is required.
you can use relative paths in the css tests or .css, .js, or images, they just can't go up ('../') and need to be either in the same directory or in a './support' directory. There's no difference between handling of .css, .js, or images.
I have also found that relative paths which go up for fonts (../) cause the font to not be loaded as this seems to trigger a not-same-source origin. At least when viewing from local disk.
FWIW: I think manual running tests is mostly a distraction, because nobody is really going to run many of them,
Testing APIs is straightforward, automatic results are returned. testing CSS can sometimes do that, but not always; CSS is primarily used to produce a visual result, which can sometimes be automatically compared to a reference (but not always); and thus there is a need to manually run tests.
Those of us who do, frequently, manually run test appreciate being able to do so, and having the nice, autogenerated reports to examine, filter, drill down into to see who is failing what at what browser version on which platform. And having those test results go away, because someone made a non-substantive change to a test, is frustrating and wastes a lot of time having to re-run tests that we already had data for.
For fonts, that should apply to a subdirectory too, since each directory is considered a separate origin.
I think there's rough consensus across many of us that keeping the extra rules isn't workable longer term, because it stops us from doing large-scale cleanup of the CSS testsuite
I think that the rest of WPT does not have the requirement that the CSS tests have, of actually linking to the specific section or sections of the spec(s) being tested, is a bug and reduces the value of the WPT. If a test author can't be bothered to document, or perhaps does not even know, what precisely they are testing then the test is at best suspect (causing extra work for others to figure out what they thought they were testing) and at worst wrong or misleading.
(most obviously, getting rid of the large number of duplicate files within it that are currently required all to be changed simultaneously),
Personally, if I revise a test and change a support file, it is faster and more efficient to know that I only have to care that my changes are correct for that test. If I have to search the entire test suite for every other test that uses a file of the same name, and verify one by one that the updated support file is still valid for all those other tests, then that is significant extra workload. Disk space is cheap, skilled labor is not.
The key functionality that I've been relying on lately is the CSSWG test harness's ability to
run manual tests (including manual reftest comparison--to work around antialiasing bugs) in most-needed order record results during those manual runs, and integrate them with imported results from automated harnesses report and analyze the test results, to help us draw up implementation reports, analyze the state of the spec, and file bugs against implementations
Just to note that I also very much value those features, and use them daily.
Personally, if I revise a test and change a support file, it is faster and more efficient to know that I only have to care that my changes are correct for that test. If I have to search the entire test suite for every other test that uses a file of the same name, and verify one by one that the updated support file is still valid for all those other tests, then that is significant extra workload. Disk space is cheap, skilled labor is not.
Yes that's faster to review, but then you might end up with, for example, 10 duplicated helper files that start to diverge... IMHO that's not good thinking on the maintenance of those helper files in the repository and it'd be better to have just one copy of them.
Personally, if I revise a test and change a support file, it is faster and more efficient to know that I only have to care that my changes are correct for that test. If I have to search the entire test suite for every other test that uses a file of the same name, and verify one by one that the updated support file is still valid for all those other tests, then that is significant extra workload. Disk space is cheap, skilled labor is not.
You still have to update the support file for other tests, though, because all the support files must be byte-for-byte identical.
I've opened https://github.com/w3c/web-platform-tests/issues/10185 for the ability to run tests from file:///, given that's come up again here, and we really ought to have one single issue for that rather than having it spilt across ten million places.
@svgeesus Wrt support files, the CSSWG test infrastructure used to have scripts that would copy the contents of support/ directories further up in a directory structure to those further down and ensure binary compatiblity, so files shared by a set of tests would go into the support/ directory of their common ancestor directory and get copied down for use throughout, e.g. hixie's swatch-color.png files or his cat would get propagated through the css2.1 directory structure and be available for use in new tests. I don't know what's the state of tooling now, though.
The Working Group just discussed WPT - Extra CSS testsuite requirements
.
Someone (@fantasai? @frivoal?) asked what other people's IRs based on wpt were like; http://w3c.github.io/test-results/IndexedDB/ is an example of this (generated using wptreport).
thanks. On the plus side, this kind of report is really to the point. On the flip side, this is awefully dry and hard to interpret if you don't have context. In other words, I am way more verbose when I do this :) see the css-ui report for example.
Anyway, the wording of the report is different, but the test result part is quite similar (except that it is missing the optional / mandatory classification).
9718 broke the CSS WG's test harness. See discussion in https://github.com/w3c/web-platform-tests/pull/9718#issuecomment-371682332, #9940, and #9953, as well as from IRC.
The status quo is that we have a number of lints to keep the CSS test harness (and the build system it relies on working), however these do not suffice to ensure everything keeps working.
Ideally, we'd get rid of all the extra requirements the test harness (directly and indirectly through the build system), as this appears to be the only achievable path to have a common set of requirements across the entire repo (I think it's clear from prior discussions that extending the currently CSS-only ones to the whole repo is not a feasible option).
Things the build system relies on we don't lint for
Things I'm aware of missing that are required to keep the built output correct:
support/
or whitelisted top-level directories (this is, of course, impossible in the general case once JS is involved because it trivially reduces to the halting problem). (This is basically #10017, I think?).htaccess
files that match.headers
files, especially after the moving files around. (#5467)<link rel=help>
could potentially be included (and the extra lints are only run incss/
).<link rel=help>
need to point to an existent anchor (#5646)Possible solutions
There are, as far as I'm aware, a number of ways to solve this:
As far as I'm aware:
Ways in which wpt.fyi doesn't suffice
When it comes to option 4, my understanding of the reasons why wpt.fyi doesn't currently work is (ignoring current infra issues like reliability of test runs and the fact we're currently running old versions of browsers):
css/css-writing-modes/
andcss/vendor-imports/mozilla/mozilla-central-reftests/writing-modes-3/
), yet alone tests that apply to multiple specs (e.g.,css/CSS2/visuren/inherit-static-offset-001.xht
which has links to bothCSS2
andcss-cascade
).It would be nice to know how important each of these is considered and whether it blocks moving away from the existing test harness.