web-platform-tests / rfcs

web-platform-tests RFCs
75 stars 63 forks source link

RFC 32: Asynchronous setup #32

Closed jugglinmike closed 4 years ago

jugglinmike commented 4 years ago

This initial draft is based on my interpretation of the need and my reading of the relevant minutes from TPAC 2019. There are a few alterations I'd like to recommend:

Disallow explicit_done The existing explicit_done configuration option and the proposed promise_setup function have a similar effect on the harness. Using both in conjunction would make it harder to understand the state of the harness. Their similarity makes me think that test authors wouldn't be interested in using both anyway.

Disallow test and async_test This was mentioned by @jgraham in the TPAC discussion, but there isn't any detail about why it's necessary.

Modify how setup reports errors The current behavior of the fully-synchronous setup may not be an appropriate model for the tests that currently use the ad-hoc methods described in the RFC.

Specifically: when a function passed to setup throws an error, no further tests are registered, and a harness error is reported. The alternatives in use by tests today have different semantics for this case:

synchronicity method harness status subtest statuses
synchronous setup() ERROR (no subtests reported)
asynchronous wait in every test individually OK FAIL
asynchronous initial dummy test OK PASS/FAIL (invalid)
asynchronous explicit_done ERROR (no subtests reported)
asynchronous promise_setup() :question: :question:

In some ways, I think that the failure mode of "wait in every test individually" is preferable. The "OK" harness status avoids suggesting that there is a bug in the infrastructure, and the failing subtest results give a more complete picture of the test's intention. The only risk I can see is instability--some tests might end up defining fewer subtests if their setup logic doesn't complete successfully (the current behavior of setup avoids this by refusing to report any information about subtests at all). To my mind, though, that implies a high degree of complexity that we ought to discourage in any case.

Consistency with the synchronous behavior is definitely important, but I don't want to implement something that doesn't meet the expectations of test authors. That's why even though the question smells of scope creep, I'd like to ask: should we update the way synchronous setup errors are reported and implement asynchronous setup error reporting in those terms?

I may be overthinking things, though. Is this distinction actually relevant to folks, or should we accept any consistent solution that indicates a problem?

foolip commented 4 years ago

Disallow test and async_test This was mentioned by @jgraham in the TPAC discussion, but there isn't any detail about why it's necessary.

This is because both test and async_test run the function callback immediately, so if defined right after the promise_setup() call they would run during setup. Disallowing them isn't the only way to resolve it if you have other ideas.

Aside: the synchronous execution is also why just supporting setup(async() => { ... }) + test(async() => { ... }) like some other frameworks would be quite a challenge, as that really only works if execution is deferred until after all tests are defined.

zcorpan commented 4 years ago

test and async_test are only a problem before promise_setup() has resolved, right?

I think test and async_test can be allowed after that, so you can use promise_setup(...).then(() => test(...)) or top-level await.

Changing when the callback for test and async_test are called based on promise_setup() seems confusing, though.

foolip commented 4 years ago

Sure, after async setup is done it wouldn't be a problem to allow both test and async_test.

jugglinmike commented 4 years ago

I can now see that the proposed algorithm is introducing a fundamentally new kind of behavior, and that seems increasingly undesirable as I think about it.

As written, we'd use the API like this:

promise_setup(async_operation);
test(() => {}, 'runs after setup is complete');
async_test((t) => t.done(), 'runs after setup is complete');
promise_test(async () => {}, 'runs after setup is complete');

There is no reference to the return value of promise_setup in the algorithm or the test.

For this to work, tests defined with test and async_test need to be put in a "holding" state. This is the new behavior because as @foolip writes, today, tests defined with those two functions always execute immediately.

Introducing a "holding" state has benefits in terms of ease of use and consistency of results structure (because the number of subtests is less likely to be influenced by the UA's performance). However, it also implies a much more expansive change to testharness.js. That change is riskier because it could accidentally alter the behavior of existing tests.

An alternative implementation that is simpler and lower-risk (but unfortunately harder to use) would be used like this:

promise_setup(async_operation);
test(() => {}, 'runs immediately');
async_test((t) => t.done(), 'runs immediately');
promise_test(async () => {}, 'runs after setup is complete');

I don't think we can implement an API like what @zcorpan is expecting. For instance:

await promise_setup(async_operation);
test(() => {}, 'runs immediately (which is after setup has completed)');
async_test((t) => t.done(), 'runs immediately (which is after setup has completed)');
promise_test(async () => {}, 'runs immediately (which is after setup has completed)');

In that scenario, there is no appropriate moment for the harness to transition to the COMPLETE state. If it transitions following the resolution of async_operation, it will not run the subsequent tests. It would have to wait for some other event. I guess we could just defer by a task or a microtask, but that seems arbitrary and therefore likely to mistakenly ignore tests.

In light of this, here's my new proposal in broad strokes:

I'm still uncertain about the appropriate semantics of setup failure. Should this be considered a harness error? Should the tests be labeled as "failed" or "not run"? Should we update the behavior of synchronous setup error? Or should we just adopt the same ERROR/(no subtest reported) behavior here?

synchronicity method harness status subtest statuses
synchronous setup() ERROR (no subtests reported)
asynchronous wait in every test individually OK FAIL
asynchronous initial dummy test OK PASS/FAIL (invalid)
asynchronous explicit_done ERROR (no subtests reported)
asynchronous promise_setup() proposal # 1 OK FAIL
asynchronous promise_setup() proposal # 2 OK NOTRUN
asynchronous promise_setup() proposal # 3 ERROR FAIL
asynchronous promise_setup() proposal # 4 ERROR NOTRUN
jgraham commented 4 years ago

I think the proposal to simply not delay [async_]test for setup comes from the same understanding of the difficulties around combining promises and synchonous execution that led to the original proposal to ban [async_]test with promise_setup, but takes a different tack on the solution. If we agree that the proposed semantics of promise_setup aren't compatible with [async_]test I'd much prefer the solution where calling promise_setup poisons the [async_]test functions rather than the one where we try to execute them anyway adding implied race conditions on whether the setup was complete.

jugglinmike commented 4 years ago

I'm all for that. Should calling test or async_test likewise poison the promise_setup function?

jgraham commented 4 years ago

For test you know you're in the HAVE_RESULTS phase so calling setup will do nothing anyway. For async_test I think it's probably fine to disallow promise_setup, unless there's some pattern where people are putting promise tests and async tests in the same file.

jugglinmike commented 4 years ago

Judging from a naive pipeline, it looks like this is done occasionally:

$ git grep -l promise_test | grep -Ev 'docs/|resources/' | xargs grep -l async_test | wc -l
78
jugglinmike commented 4 years ago

(moving from an in-line comment to the main discussion thread, since the conversation is getting lengthy)

I don't understand what's bad about transitioning to COMPLETE in a task after promise_setup has fulfilled (using setTimeout( ..., 0)), and return a thenable from promise_setup. Or maybe if onload hasn't happened yet, wait for that. It's arbitrary, sure, but so is the current onload handling, and as shown in my previous example, it can miss tests. But testharness.js optimizes for test writing ergonomics.

Making arbitrary decisions for such a subtle behavior is risky. It's possible to over-optimize for ergonomics. Happiness is a warm footgun.

I wonder if there isn't a pattern for promise-returning functions one could follow here? For things that fire events it's very straightforward to do some stuff after any listeners have been invoked, is there no equivalent for promises?

I'm not aware of a pattern, either. That might just be a sign that we're out of touch, but it could also suggest that there is no way to address this generically.

As a straw-man, we could design testharness.js to defer harness completion until after microtasks waiting on promise_setup have run:

function promise_setup(fn) {
  const p = fn();

  p.then(() => p.then(done));

  return p;
}

My concern is that this behavior exposes too sharp an edge for test authors. I worry that people might intuitively write the following code:

await promise_setup(my_setup);
test(() => {}, 'a');

And seeing this work as expected, they might just as intuitively write:

await promise_setup(my_setup);
test(() => {}, 'a');
await somethingElse();
test(() => {}, 'b');

They'd do this without really considering the feasibility of such an API. The risk comes from the fact that errors like this can't be detected until after results have been reported. In the worst cases (like when there are many subtests and they're expected to pass), authors may be unaware that the late subtests are never executed.

zcorpan commented 4 years ago

And seeing this work as expected, they might just as intuitively write:

await promise_setup(my_setup);
test(() => {}, 'a');
await somethingElse();
test(() => {}, 'b');

This is a good point.

OK. Instead of guessing when to transition to COMPLETE, we could make promise_setup() mean explicit_done: true i.e. the test needs to call done() when all tests are created.

jgraham commented 4 years ago

Well you aren't really supposed to have top level code like await somethingElse(). I feel like this is pretty close to the existing situation where you can do something like

onload = {
  test(() => assert_true(), "Defined in load");
  setTimeout(() => {
     test(() => assert_false(), "Defined after load")
  }, 0)
}

In that case one test will be found and the other won't.

zcorpan commented 4 years ago

I think an interesting question is whether there are tests that would need promise_setup() but have to use test or async_test? If there aren't such cases, we can support only promise_test.

IDL tests were brought up, but I don't know why they shouldn't use promise_test.

jugglinmike commented 4 years ago

Another argument against await promise_setup(my_setup) is that it doesn't seem to offer any benefit over await my_setup().

foolip commented 4 years ago

IDL tests were brought up, but I don't know why they shouldn't use promise_test.

Making idlharness.js to only use promise_test would probably work, but it would change the timing of when tests are run. That would probably break some tests that refer to a loop variable that will have changed by the time the tests are run.

I think just not using promise_setup in idlharness.js will be fine, we can use setup({explicit_done: true}) instead.

jugglinmike commented 4 years ago

It's been one week since the previous comment, and I don't believe there is any substantive disagreement. Can this be merged, or would any participant prefer to extend the discussion to two weeks?

foolip commented 4 years ago

Oh, I was under the impression that the behavior of test and async_test when combined with promise_setup still wasn't settled and hadn't reviewed the RFC text in detail. I see that test and async_test aren't explicitly in the RFC text, can you add that? Then I'd like to let another week pass, unless I was the only one confused about the status of this. 3 others saying that this LGTM as-is would make me happy to not delay it any further.

Ms2ger commented 4 years ago

Oh, I was under the impression that the behavior of test and async_test when combined with promise_setup still wasn't settled and hadn't reviewed the RFC text in detail. I see that test and async_test aren't explicitly in the RFC text, can you add that? Then I'd like to let another week pass, unless I was the only one confused about the status of this. 3 others saying that this LGTM as-is would make me happy to not delay it any further.

Same for me.

jugglinmike commented 4 years ago

Updated according to https://github.com/web-platform-tests/rfcs/pull/32#issuecomment-538524179

jugglinmike commented 4 years ago

Another week has passed. Is this ready to land?

foolip commented 4 years ago

@jugglinmike yep, enough time has passed!