Open zcorpan opened 6 years ago
I think we should try to make all harness errors fail the Travis jobs, unless while trying to do that we discover some reason why harness errors really should be allowed. I imagine that @jgraham has thoughts on this.
What do you mean "harness errors"? Do you mean errors in testharness.js or in wptrunner?
Assuming you mean the former, there is an already-discussed ambiguity between single page tests which are likely to produce ERROR
if the feature in question isn't implemented, and tests that unintentionally error due to a bug. One could certainly look for cases where multiple browsers produce ERROR
. This was the problem that the pulls dashboard was supposed to solve, but that seems to have stopped working.
We're talking about testharness.js harness errors. We can make progress on this even if https://github.com/w3c/web-platform-tests/issues/10853 remains unresolved, it just means that if https://github.com/w3c/web-platform-tests/issues/10853 is fixed, we might need to fix a bunch of new harness errors at the same time. I would slightly prefer the other order of fixing though.
Also TIMEOUT
may be good to prevent.
TIMEOUT
is fine in the case where you're waiting for an event that never happens. The alternative is that authors add some shorter timeout, making the test more prone to intermittent failures.
Even forERROR
, where a top-level error status might well indicate a bug, I'd be interested to see evidence for or against the theory that an ERROR
in a subset of implemenatations is usually indcative of a test bug and so should block landing. For cases where no browser has a non-ERROR
, non-TIMEOUT
status, the case for blocking landing is more obviously strong, but it does disadvantage tests written against implemenations we can't run in CI (notably Safari/WebKit and Edge)
@zcorpan @jugglinmike, in your triage of tests that produce harness errors, have you come across any that you don't believe are test bugs? Failing to find such hard cases is IMHO a reason to prevent them from being introduced as suggested in this issue, but I think we should still have an escape hatch in lint.whitelist for cases when browser bugs are the cause.
I found https://github.com/web-platform-tests/wpt/issues/11269#issuecomment-393525769 today.
Maybe we shouldn't block landing of TIMEOUT or even ERROR, but making it visible when it happens somehow could be useful; in most cases it's bug in the test.
That was one of the things that I thought was planned for the PR dashboard before it got abandoned. I certainly think that requiring people to think before landing a PR that introduces ERROR
everywhere is a good plan.
Indeed, there are patterns of changes that we want to make sure do not happen accidentally, but which are still sometimes correct. For things like that, I think the options are to either post a comment about it, which is possible to overlook, or to have a failing status check that leads to a page describing a problem, along with a "make the status green" checkbox for when it's actually appropriate. A bit complex, I know. cc @lukebjerring
As a first step towards this, when implementing https://github.com/web-platform-tests/wpt/issues/7475, we can treat new harness errors as more serious and a different category of problem than new failing tests.
@jugglinmike, do you have a somewhat up-to-date link to the report of harness errors, so we can see just how long the path to eliminating harness errors might be?
Not off hand, but it's quick to generate one. Here are the failures for WPT revision 4a15e98c1339681493dff207de2c69ea29efa602, dated Thu Aug 2 11:45:23 2018 +0200:
Uhg, by "failing" I really mean "producing errors"
Alright, that isn't a huge number, 591 in total. I know that @zcorpan has already fixed a bunch. @jgraham, if we're able to fix >80% of these as test bugs, do you think that'd be a strong enough case and not many remain because of implementation bugs, would that be a strong enough case to turn harness errors into something that by default fails PRs unless the test is whitelisted?
@jugglinmike, the list in https://github.com/web-platform-tests/wpt/issues/10877#issuecomment-410348344 is great! I sent PRs to fix the "producing errors in 5 browser(s)" tests, and I imagine many of the rest are similar oversights. Preventing this with tooling would be fabufantastic.
I certainly agree that some mechanism to require manual approval if there are errors in multiple browsers seems reasonable, independent of how hard it is to fix. For tests that only error in a single browser, I guess it depends what the fix looks like; in general I guess these addiitional checks aren't going to get run on the Chrome/Gecko/other upstream side, so it seems like triggering this failure too often might be problematic for developers primarilly contributing upstream. Therefore if the ERROR case mostly describes valid tests that could simply have been written more carefully to not error when the features are missing, glagging them seems like a poor tradeoff (so far I haven't seen complaints about tests we have synced that error, but I did just get a complaint about some tests we synced that were buggy and failed, so we should be realistic about the value of treating ERROR as uniquely bad).
All that said, I don't know what the mechanism for allowing tests that produce multiple errors through should be; putting it in the source seems wrong because it's not a property of a specific git revision, but a property of running that revision in specific revisions of a set of browsers. Ideally I would like some kind of checkbox in the PR like
[x] Test that ERRORs is not buggy.
But enforcing that could be tricky without building a landing bot.
Therefore if the ERROR case mostly describes valid tests that could simply have been written more carefully to not error when the features are missing, glagging
I'm unable to guess what this is a type of...
them seems like a poor tradeoff (so far I haven't seen complaints about tests we have synced that error, but I did just get a complaint about some tests we synced that were buggy and failed, so we should be realistic about the value of treating ERROR as uniquely bad).
Sure, harness errors aren't the worst kind of problem a test can have, but it is one that's pretty easy to create tooling around, which would help catch some smallish mistakes. (See my 3 PRs above.)
Ideally I would like some kind of checkbox in the PR
With https://developer.github.com/v3/checks/ we'd be able to have a button that says "this harness error isn't a bug", see the "Ignore this" in https://github.com/foolip/web-platform-tests/runs/9087881 for an example.
It would require us to create/host some infrastructure for such status checks, though, or perhaps wait for Taskcluster to get it.
I'm unable to guess what this is a type of...
flagging.
With https://developer.github.com/v3/checks/ we'd be able to have a button that says "this harness error isn't a bug", see the "Ignore this" in https://github.com/foolip/web-platform-tests/runs/9087881 for an example.
I don't see such a button, are you sure you don't have to be an admin of the repo to use it? That would be equivalent to the situation today, but we certainly want the PR author or reviewers to be able to override the checks.
Yeah, I guess you need write access to the repo. Here's a screenshot:
Given that we'll stop running stability checks in Travis in favor results+stability in Taskcluster, this issue isn't quite accurate. I'll change the title to reflect that it might not involve Travis.
Ping from your friendly neighbourhood ecosystem infra rotation
If this is priority:roadmap
should it have an assignee?
So, in theory a harness error will be picked up as a regression by the wpt.fyi status check.
That check is currently neutral
when it detects a regression, but we do have a feature for making it fail (and an Ignore
action built into the check), which I think satisfies this issues' current title:
Make harness errors block PRs (with override)
So, currently blocked on launch of https://github.com/web-platform-tests/wpt.fyi/projects/6 ?
@lukebjerring can this be assigned to you, or is harness errors further down the line than regressions? I guess harness error regressions are already prevented? I actually don't know that we need to do more than that...
This is something I think we might still want to do, leaving as roadmap but probably won't get done in Q2 because @lukebjerring is OOO.
@lukebjerring is this something we should look at again in Q2? I don't know how big of a problem it is that tests with new harness errors land vs. the effort required to prevent it. I guess it'd just be a failing wpt.fyi check, and a new action for "I really mean it"?
@jugglinmike made a list of tests producing harness errors in https://github.com/web-platform-tests/results-collection/issues/478#issuecomment-384131457
@foolip do we want to make harness errors fail in Travis?