web-platform-tests / rfcs

web-platform-tests RFCs
83 stars 67 forks source link

RFC #16 - PRECONDITION_FAILED subtest result (status) #16

Closed lukebjerring closed 5 years ago

lukebjerring commented 5 years ago

Rendered

jgraham commented 5 years ago

I'm somewhat on the fence about whether this is a good idea. In practice truely optional features are rare and almost always represent compat hazards. We don't want to go out of our way to encouage such optionality.

I am concerned about the amount of code that has to change to support this. For example mozlog has to change, and log consumers may also have to change to handle the status in a meaningful way. Of course it's not possible to enumerate the full set of log consumers.

In terms of the API, this isn't an assert, it shold be a method on the test like force_timeout which immediately sets the status to TIMEOUT.

jugglinmike commented 5 years ago

The RFC currently focuses on consumers of test results. You may also want to note that this makes the tests themselves more expressive, allowing authors to be more explicit about their expectations.

Currently, tests for proposed specification changes are denoted with a filename pattern (.tentative.). This use case seems semantically equivalent, and since your solution operates at a sub-test level, it's more precise. I'm wondering if your solution could replace the file name convention. That would reduce the learning curve for contributors.

One nice aspect of implementing this as an assertion is that "failures" will interrupt further testing. Maybe invert the meaning, though. assert_implemented wouldn't need to be used from a separate codepath; the condition could be expressed as an argument to the function.

lukebjerring commented 5 years ago

In practice truely optional features are rare and almost always represent compat hazards

While rare (I'm only currently aware of Codecs, and WebCryptoAPI), I still feel like this is an important distinction to make, especially for permanently unimplemented features.

WebCryptoAPI, for example, could have a valid implementation that didn't implement any of the encryption algorithms. (IMHO, it's preferable that we achieve interop for the implementations, since this would avoid a web dev needing to worry about which user agents implement which algorithms). We've heard back from Chrome that not supporting AES 192 bit, for example, is working as intended.

The conclusion was that Web Crypto does not mandate support for any particular algorithms, and would only go so far as to recommend certain algorithms at the time it was published. Compatibility has been left to external interop profiles.

Thus, the "FAIL" WebCryptoAPI results aren't really useful as failures (and create unnecessary noise).

lukebjerring commented 5 years ago

@jugglinmike - nice to hear two sides of the argument. Under further scrutiny, though, I think I side with @jgraham here in that we don't want to encourage people to unnecessarily make spec features optional, so I'm unsure about coupling this change with .tentative. use-cases.

As I understand, .tentative. is more around whether the details covered by the test will actually stick to the spec long term. My intention for NOT_IMPLEMENTED is more around the case of flagging tests as "if this, of many, optional implementation does exists, here's some assertions about its behaviour".

Bailing out (immediately) as a forced test result via test.unimplemented() is equally as expressive, since the conditions leading up to it can be try/catch, if, whatever.

lukebjerring commented 5 years ago

@foolip - was codecs the only other example you mentioned? Are you aware of other APIs that list optional capabilities?

foolip commented 5 years ago

@lukebjerring will be OOO for a while, so I suggest we put this on ice until he's back.

One option that we could consider is something inspired by what mocha does. There, the context object, which is like our test object, has a skip method. It'd work like this:

test(t => {
  if (thingIsntSupported) {
    t.skip('maybe a reason here'));
  }
}, 'test name');

This aligns with @jgraham's suggestion to use a method and test.unimplemented(), but is a bit more explicit. It would still result in a new subtest status however.

foolip commented 5 years ago

@lukebjerring now that you're back, do you want to pick this up again? What do you think about the t.skip() method to match mocha?

Orphis commented 5 years ago

I am currently writing codec related tests for the WebRTC test suite and this feature (or something equivalent) would definitely be useful to convey a better meaning to the test results.

I do prefer the t.skip() approach better in this case.

foolip commented 5 years ago

@lukebjerring https://wpt.fyi/results/html/semantics/embedded-content/media-elements/mime-types/canPlayType.html is the media element codec test I had in mind. It adds "(optional)" to some test names, but unfortunately I don't know that this test would be much improved by a mechanism to skip subtests, as the "is this implemented" check itself is what the test is for.

lukebjerring commented 5 years ago

I agree with the ergonomics of a t.skip. Let's update the RFC to reflect that.

foolip commented 5 years ago

@lukebjerring I'm writing a test today that could make use of this. Is there anything I can review to help move this along?

jgraham commented 5 years ago

I commented in the related PR, but I'm somewhat concerned that the approach of reusing SKIP here makes it impossible to tell the difference between a test that has been explicitly disabled (e.g. due to instability) and one that has failed a precondition check in the test itself. The ability to audit tests that have been disabled is useful and I'm nervous about regressing that.

foolip commented 5 years ago

When disabling tests, isn't the status of the whole test SKIP, rather than of the subtests?

jgraham commented 5 years ago

You can also disable individual subtests (basically meaning that they run but the result is replaced by SKIP).

lukebjerring commented 5 years ago

If in understand correctly, there's no overlap between usage of the two features (in-test skip and disable-skip); only a common outcome. There's still a distinguishing message field for the reason they were skipped, and I'd expect you would never disable a skipping subtest as unstable (though I guess we could contrive a badly written test that makes it unstably skip sometimes).

Can you give an example of how the feature is used (or point to some docs)? I wasn't aware of it.

jgraham commented 5 years ago

See e.g. https://searchfox.org/mozilla-central/source/testing/web-platform/meta/html/dom/interfaces.https.html.ini#61 which disables that subtest. Actually looking at the code it seems we are just ignoring those tests (see https://searchfox.org/mozilla-central/source/testing/web-platform/tests/tools/wptrunner/wptrunner/testrunner.py#561-562); that's clearly a bug and we should set the status to SKIP like we do for top-level tests.

lukebjerring commented 5 years ago

So, we actually started with NOT_IMPLEMENTED in the proposal, but was too specific, since non-implementation isn't the only reason we might skip. Funnily enough, the use of SKIP you describe might be better named as DISABLED?

Is there a way of 'overloading' result types? Like, multiple enum values that use the same numeric value?

jgraham commented 5 years ago

I don't think we can overload result types. Changing SKIP to DISABLED unfotunately breaks any tooling that currently depends on SKIP (the fact that it isn't provided on subtests not withstanding; tools can often treat subtest statuses and test statuses identically). It seems like the cases discussed here are more like PRECONDITION_FAILED, if the main concern is naming.

lukebjerring commented 5 years ago

Naming is only a concern once disambiguation is made a prerequisite. It seems that your need for distinguishing "infra skipped" vs "test skipped" is only theoretical, given that the implementation as-is is omitting the data entirely (which wpt.fyi would show as MISSING, a cosmetic/UI fake-status).

Another approach, which I understand has come up before, is to support multiple expectations. This is Chromium's approach to ignoring flakiness, and the equivalent here would just be support an ANY alias that lets all statuses through. I believe @LukeZielinski has this work on his radar, as it relates to using the wpt-runner directly inside Chromium's CQ.

What are your thoughts on multi-expectations?

jgraham commented 5 years ago

There is separately work on mutliple expectations happening. But I don't expect it to be a full replacement for disabling tests, so having SKIP either mean "disabled" or "test decided not to run" is still confusing.

lukebjerring commented 5 years ago

There is separately work on mutliple expectations happening

Is there an RFC/PR/bugzilla entry for tracking the work? And yeah that's true, there's still use-cases for disabling tests outside of flakiness, so it won't be a full replacement.

In those remaining cases, though, as it stands the behaviour around SKIP for disabled tests is misleading; the test wasn't skipped (it did run) and the result is being omitted/ignored. In those cases, there's a need to disambiguate an omitted (MISSING) result as "dropped because ignored" vs "not actually run", and use of the SKIP result is probably a misnomer.

Given that (due to what you call a bug in the implementation) we're not actually in a position where SKIP subtests is ambiguous, can we address the need to distinguish "disabled" separately? Perhaps in another RFC? My first thoughts are that DISABLED is a better status name for those cases, and we could simply migrate that codepath (which currently ignores the subtest entirely) to emit a different status eventually, having a slight risk of confusion in the short-term to unblock the need for SKIP motivating this RFC.

jgraham commented 5 years ago

There is separately work on mutliple expectations happening

Is there an RFC/PR/bugzilla entry for tracking the work? And yeah that's true, there's still use-cases for disabling tests outside of flakiness, so it won't be a full replacement.

https://bugzilla.mozilla.org/show_bug.cgi?id=1552879

In those remaining cases, though, as it stands the behaviour around SKIP for disabled tests is misleading; the test wasn't skipped (it did run) and the result is being omitted/ignored. In those cases, there's a need to disambiguate an omitted (MISSING) result as "dropped because ignored" vs "not actually run", and use of the SKIP result is probably a misnomer.

Probably, but the argument for the status quo, plus a fix to wptrunner to report SKIP on subtests, is that the reason for disabling subtests is similar to the reason for disabling full tests, and using SKIP for both allows tools to have a single codepath that works for all cases. If SKIP means "explictly disabled" in some places but something different in others any existing tooling needs to be updated to understand the difference. If SKIP on tests is equivalent to DISABLED on subtests the situation is even more confusing.

I understand that adding a new status is a relatively large undertaking and I'm sympathetic to the desire to avoid that work. But this RFC is really adding a new semantic that we haven't had before so it seems correct to associate that with a novel test status.

lukebjerring commented 5 years ago

If SKIP on tests is equivalent to DISABLED on subtests the situation is even more confusing.

Not what I meant; To clarify, I am proposing that what is currently called SKIP (explicitly disabled tests/subtests) be (later) migrated to a new status, DISABLED, and SKIP be retained for exclusive use for what this RFC is about (skipping a test at runtime).

As a stepping stone, in the interim, have (ambiguously) both outcomes as SKIP, i.e. DISABLED (test-level) and SKIP (subtest-level). The immediate consequences on tooling should be limited, or none, since the current behaviour for disabled subtests is to drop the result from the output, and only new usages of the t.skip() pattern would produce SKIP as a subtest result (so there's no ambiguity).

Steps could be

WDYT? I argue that the work for a novel status is needed whether we A) rename SKIP here to .. something else (PRECONDITION_FAILED is accurate, but long!) or B) run with DISABLED. Since the work is needed either way, it makes more sense to take the more precise naming?

jgraham commented 5 years ago

If SKIP on tests is equivalent to DISABLED on subtests the situation is even more confusing.

Not what I meant; To clarify, I am proposing that what is currently called SKIP (explicitly disabled tests/subtests) be (later) migrated to a new status, DISABLED, and SKIP be retained for exclusive use for what this RFC is about (skipping a test at runtime).

As a stepping stone, in the interim, have (ambiguously) both outcomes as SKIP, i.e. DISABLED (test-level) and SKIP (subtest-level). The immediate consequences on tooling should be limited, or none, since the current behaviour for disabled subtests is to drop the result from the output, and only new usages of the t.skip() pattern would produce SKIP as a subtest result (so there's no ambiguity).

This plan seems to introduce all the complexities of needing to introduce a new status code, plus the complexities of needing to audit and update existing consumers in case they have an existing codepath that handles SKIP in test_end and test_status the same, which is a fairly reasonable thing to do. By contrast the plan where we keep SKIP meaning disabled has only the first complexity (needing to introduce a new status code), but maybe makes the internal codes map better to people's intuitions. However the latter point is at least debatable since other test frameworks use skip in a way that is analogous to explicitly marking a test as not to be run.

I think the main difference here is that I'm coming from the point of view where there are many existing consumers of these logs in various places, because that's the situation for gecko. If you are thinking that all consumers are in wpt (or even just in a single repo) then obviously the need to update these consumers will seem less significant in making a decision.

lukebjerring commented 5 years ago

Yeah, less work is always a good thing.

So, if tweaking this RFC is the best way to move forward, how about a figuratively synonymous but semantically distinct result name? I asked my wife for "words that kinda mean skipping over something" and she suggested BYPASS. I can't decide if it's terrible, or quaint. Thoughts?

Ms2ger commented 5 years ago

@jgraham ?

jgraham commented 5 years ago

We talked about this and I think the intent is to go forward with the plan to introduce a new status for this feature, but I don't remember if we decided what that status ought to be.

lukebjerring commented 5 years ago

I'll be updating the RFC and corresponding PR with a new status name soon. OOO for the next week.

lukebjerring commented 5 years ago

@jgraham - one remaining ambiguity is the t.skip('reason message') helper. I haven't managed to come up with a different imperative for "produce a PRECONDITION_FAILED result". Are we ok with the re-use of the verb skip here? If not, do you have a suggestion for the method name?

zcorpan commented 5 years ago

How about t.precondition(condition, message)

foolip commented 5 years ago

That would be OK I think. @lukebjerring, preferences?

jgraham commented 5 years ago

Yeah, I'm not as bothered about the function name, but precondition does allow you to write an assert_precondition helper, which might be more natural in some cases e.g.

test(t => {
    assert_precondition(MyInterface in window);
   // Actual test
})
lukebjerring commented 5 years ago

@zcorpan - my takeaway from your suggestion is that it's not great that t.skip has diverged from the resulting subtest status name. Provided you're not passionate about t.precondition, @jgraham 's suggestion is more aesthetic IMHO. I'll update the prototype PR to ditch t.skip and instead add an assert_precondition helper.

zcorpan commented 5 years ago

assert_precondition sounds good. Having the names match up seems like a good idea to me, to avoid confusion.

Noting for future copy-pasters that @jgraham's example is wrong, it should be

test(t => {
    assert_precondition("MyInterface" in window);
   // Actual test
})

(or in self if it should work in non-window context)

foolip commented 5 years ago

Came across that would benefit from being converted to this new style: https://github.com/web-platform-tests/wpt/blob/50d08ddc4269d774f76e167d87495a141df7134b/content-security-policy/connect-src/shared-worker-connect-src-allowed.sub.html#L20-L23

jgraham commented 5 years ago

So it seems like this RFC needs to be updated to match the decisions that have been made, but once that's done it should be possible to go ahead and merge it.

lukebjerring commented 5 years ago

I've updated to reflect the consensus (as I understand it to be). PTAL.

foolip commented 5 years ago

Renamed file to assert_precondition.md to match the name of the added API surface.

foolip commented 5 years ago

Since the proposal changed let's leave a little bit more time for people to react, but I think it'd be low risk to start implementing this now across the repos impacted.

foolip commented 4 years ago

All who participated in this RFC, a review is now up at https://github.com/web-platform-tests/wpt/pull/19993 and I think there are some details worth a second opinion on. I'll add comments on those bits and see if I get any feedback :)