whatwg / streams

Streams Standard
https://streams.spec.whatwg.org/
Other
1.35k stars 161 forks source link

Tests contain non-zero timers #548

Open ricea opened 8 years ago

ricea commented 8 years ago

Short timers make tests flaky. From the Testharness.js API Documentation:

Note that timeouts generally need to be a few seconds long in order to produce stable results in all test environments.

Long timeouts slow down all developers by making the the tests take longer to run. They also waste resources on automatic testing infrastructure.

Mostly timers are being used to ensure that the implementation never takes some action that it shouldn't. For example (from writable-streams/start.js):

promise_test(() => {
  const ws = recordingWritableStream({
    start() {
      return Promise.reject();
    }
  });

  // Wait and verify that write or close aren't called.
  return delay(100)
      .then(() => assert_array_equals(ws.events, [], 'write and close should not be called'));
}, 'underlying sink\'s write or close should not be invoked if the promise returned by start is rejected');

This test never[1] flakes on a correct implementation. Suppose hypothetically that there existed an implementation which called close() here after a asynchronously performing an action which takes an arbitrary amount of time. Then this test would be flaky on that implementation: it would fail when the action took <100ms, and pass otherwise.

I propose that we make the simplifying assumption that no reasonable implementation will do the wrong thing after a delay. We can then disregard the above hypothetical implementation as being unreasonable, and so consider its false positive on the above test to be acceptable.

I furthermore propose that we consider two times around the event loop to be long enough to see whether an implementation is going to do the wrong thing or not. My reasoning is that the implementation may run a callback after the user code, which then posts another async callback which finally does the wrong thing.

I think we should have a function flushAsyncEvents() which is equivalent to delay(0).then(delay(0)). The above test would then look like

promise_test(() => {
  const ws = recordingWritableStream({
    start() {
      return Promise.reject();
    }
  });

  // Verify that write or close aren't called.
  return flushAsyncEvents()
      .then(() => assert_array_equals(ws.events, [], 'write and close should not be called'));
}, 'underlying sink\'s write or close should not be invoked if the promise returned by start is rejected');

[1] Test frameworks usually have a hard timeout, which means that even passing tests will fail occasionally. The longer a test takes, the more likely it is to hit the hard timeout. This is seen in practice on the Blink layout tests.

domenic commented 8 years ago

This sounds good to me. Minor correction: delay(0).then(() => delay(0)).

It doesn't take care of some remaining cases where we're actually using the timers for sequencing, but we can have a separate discussion about those.

ricea commented 7 years ago

Some notes from recently porting some tests to not use non-zero delays:

If there is only one delay() in the test, then it's probably only being used to test async behaviour, and can just be replaced by delay(0). If there are negative assertions (assertions that something has not happened) in the test then you may need to use flushAsyncEvents().

Even many tests that use delay(n) for sequencing can be easily rewritten without non-zero delays. Here's four examples, only one of which results in a non-deterministic ordering:

// Example (1)
delay(10).then(a);
delay(20).then(b);

// Example (2)
delay(10).then(a);
someOtherEvent.then(() => delay(20)).then(b);

// Example (3)
someOtherEvent.then(() => delay(10)).then(a);
delay(20).then(b);

// Example (4)
delay(10).then(a);
delay(10).then(b);

In examples (1), (2) and (4), a happens deterministically before b, and so they can be rewritten using chained promises. In example (3) the order may be nondeterministic.

In the general case, tests with non-zero delays implicitly test the ordering of events and should be rewritten to explicitly test ordering instead. Usually the recording stream test helpers in streams/resources/recording-streams.js are the right tools for this.

The tricky part is identifying those implicitly tested ordering expectations. It's easy to rewrite the test so it looks the same and passes, but accidentally reduce the power of the test in identifying incorrect implementations.