Test order is affected by a presence of a nested test

ArtemGovorov commented 8 years ago

The following code:

var test = require('tape');

test('first', function (t) {

  t.plan(2);

  setTimeout(function () {
    t.ok(1, 'first test');
  }, 200);

  t.test('second', function (t) {
    t.ok(1, 'second test');
    t.end();
  });
});

test('third', function (t) {
  setTimeout(function () {
    t.ok(1, 'third test');
    t.end();
  }, 100);
});

outputs:

ok 1 first test
ok 2 second test
ok 3 third test

and it's expected. But if I change it to its logical equivalent without plan:

var test = require('tape');

test('first', function (t) {

  setTimeout(function () {
    t.ok(1, 'first test');
    t.end();
  }, 200);

  t.test('second', function (t) {
    t.ok(1, 'second test');
    t.end();
  });
});

test('third', function (t) {
  setTimeout(function () {
    t.ok(1, 'third test');
    t.end();
  }, 100);
});

the the test order is broken:

ok 1 second test
ok 2 third test
ok 3 first test

While changing the order of the first and the second test is somewhat expected, ending the first test prematurely, and letting the third test execute before it, looks like a bug. Especially considering that commenting t.test('second'... test out restores the expected order, and make the third test wait until the first one ends.

davegomez commented 8 years ago

Is anyone checking at this? @substack @ljharb

GuidoS commented 8 years ago

I think this may be tied to this other open issue https://github.com/substack/tape/issues/122

Raynos commented 8 years ago

This is not an ordering problem; this is a reporting problem.

t.end() will wait for all child tests to finish.

I agree this is confusing semantic difference between t.end() and t.plan().

I recommend adding console.log() to convince yourself of the actual order of execution and to not worry about which order tape reports TAP statements to stdout.

ArtemGovorov commented 8 years ago

@Raynos From what I see, it is definitely an ordering issue, not a reporting one. Adding console.log (with timestamps just in case), produces the same result:

var test = require('tape');

test('first', function (t) {

  setTimeout(function () {
    console.log('first test', +new Date());
    t.ok(1);
    t.end();
  }, 200);

  t.test('second', function (t) {
    console.log('second test', +new Date());
    t.ok(1);
    t.end();
  });
});

test('third', function (t) {
  setTimeout(function () {
    console.log('third test', +new Date());
    t.ok(1);
    t.end();
  }, 100);
});

second test 1451094351565
third test 1451094351676
first test 1451094351756

Raynos commented 8 years ago

@ArtemGovorov

Your right; this is simply broken.

Raynos commented 8 years ago

I honestly do not know enough about what the semantics of sub tests are supposed to be to fix this...

It should either be:

second test first test third test

or

first test second test third test

ArtemGovorov commented 8 years ago

I think it should be

first test
second test
third test

because sub tests are not much different from the same level tests in a sense that they may need the previous/parent test to be finished. For example, nested tests may use a context that is set up in the parent test and may need the parent test to finish working with the context before a nested test can start using the context.

ljharb commented 8 years ago

I would expect second, first, third, based on my expectation of subtest ordering - the setTimeout means that t.end doesn't run on the current tick, so all the subtests of "first test" would run before the next tick starts, and "first test" gets logged.

tape follows JS scoping rules, so the ordering of the tests shouldn't affect anything unless a subtest mutates something another subtest depends on.

ArtemGovorov commented 8 years ago

@ljharb Why allowing the second test to run before the first one is finished? Except the nesting level, the second and the third tests are not much different conceptually, so why having different rules for the sub tests?

ArtemGovorov commented 8 years ago

In other words, the first test may be doing things in this scope:

test('first', function (t) {
  ...

as well as in this scope:

  setTimeout(function () {
     ...
  }, 200);

Allowing the second test to run before the first one is finished, means potentially interrupting the work of the first test. The isolation of the first test is broken.

ljharb commented 8 years ago

Using setTimeout in this fashion seems very strange to me - you're actually delaying creation of the subtest. If you do the setTimeout inside the subtest, I suspect you'll be much less surprised by the test ordering. Can you elaborate on why you're doing that?

Raynos commented 8 years ago

The bug here is that sub tests work differently purely on the existence of plan()

ArtemGovorov commented 8 years ago

@ljharb So I had a code like this:

var test = require('tape');

test('first', function (t) {

  setTimeout(function () {
    t.ok(1, 'first test');
    t.end();
  }, 200);
});

test('third', function (t) {
  setTimeout(function () {
    t.ok(1, 'third test');
    t.end();
  }, 100);
});

The first test was finishing before the third one. Then I added a subtest:

var test = require('tape');

test('first', function (t) {

  setTimeout(function () {
    t.ok(1, 'first test');
    t.end();
  }, 200);

  t.test('second', function (t) {
    t.ok(1, 'second test');
    t.end();
  });
});

test('third', function (t) {
  setTimeout(function () {
    t.ok(1, 'third test');
    t.end();
  }, 100);
});

Adding the second test introduced 2 issues:

the second test (subtest) is executed before the first test is finished;
the third test is now executed before the first test is finished.

Regarding the first issue - semantically, subtests are separate tests. They're executed after their parent test. That's what happens when the parent test in synchronous or has .plan call like here. Async parent test without the explicit .plan call behaviour is just inconsistent with the rest.

If you do the setTimeout inside the subtest

I know I may do the setTimeout inside the subtest, or may define the subtest inside the setTimeout, or may add .plan just like in the example I gave in the initial issue report. The issue with inconsistent behaviour remains though, as @Raynos said, and I agree, the bug is that sub tests work differently purely on the existence of plan().

Regarding the second issue - no matter how I structure the first test subtests, the third test should never execute before the first one is finished. It is inconsistent with tape's promise to execute tests sequentially.

ljharb commented 8 years ago

Gotcha - I always use plan so I haven't ran into it. I agree that the existence of plan shouldn't affect ordering of the tests.

kl0tl commented 8 years ago

This seems to be a consequence of supporting nested tests that implicitly end their parent.

See the test for this behavior.

Something that let any asynchronous process to settle instead of nextTick() should work. Any idea ?

sladiri commented 8 years ago

Hi, as a beginner and maybe missing some bigger picture:

This does not look like a bug, because that is how the runtime works. ~~I have not used plan and I did not know that it preserves the order somehow~~. If I do a setTimeout(), the order of the three tests in the example without plan() is exactly what I would expect to happen. Any other behaviour caused by tape would confuse me. I like the goal for tape not to introduce some magic like other test-runners do, and I think changing the current behaviour would introduce such magic.
Apparently plan() preservers order explicitly as an option and I am fine with that. That is actually a bonus I didn't even expect to get, although its name does not hint at this behaviour. For asynchronous functions I have been using Promises and then() for example for preserving some order. Now those introduce problems with stack traces but I guess that is a different story.
Actually, with asynchronous functions I have been using blue-tape, where I return a Promise from a test, and t.end() is called for me. This way I can easily avoid the anti-pattern of calling promise.resolve() explicitly (and the call to t.end() in the example with plan() is a analogous anti-pattern I think). A test using asynchronous functions is always wrapped inside a promise, and no errors can "disappear" for example.

ArtemGovorov commented 8 years ago

If I do a setTimeout(), the order of the three tests in the example without plan() is exactly what I would expect to happen. Any other behaviour caused by tape would confuse me.

The issue is that the third test starts before the first test is finished (with 3 tests), AND if you remove the second test, then the third test doesn't start the first test is finished. Given that tape tests are executed serially, I think no matter sync or async, with our without plan, the third test should never start before the first one is finished. Starting a next test while a previous one is still not finished leads to issues with a shared context.

sladiri commented 8 years ago

Starting a next test while a previous one is still not finished leads to issues with a shared context.

Your example without plan() does the following in Promise-terms, where the order of execution is not deterministic (which is fine if there is no shared mutable context).

// pseudo code
Promise.all([ test1, test2, test3 ])

In order to execute test3 always after the first two, because it shares some mutable context with the others for example, you have to seperate their "async contexts", and I think that this should not be tape's responsibiltiy:

Promise.all([ test1, test2 ])
  .then(Promise.resolve( test3 ))

To depend on deterministic order for asynchronous execution is an anti pattern as far as I understand.

PS I have not used plan() yet, I did not like to count my tests in order to have the correct number of subtests at the start. And I have not used sub tests for async functions like in your example either, because in my case blue-tape does not automatically fail a supertest when one of a subtests fails.

ljharb commented 8 years ago

(Promises aren't quite the same - a promise represents a future value for work that has already begun, whereas a callback is the work itself)

sladiri commented 8 years ago

(Promises aren't quite the same - a promise represents a future value for work that has already begun, whereas a callback is the work itself)

In the example you are not working with the callbacks directly, but passing them to seTimeout() and setTimeout() is some work to be done in the future, isn't it?

ljharb commented 8 years ago

that is true. but the test itself is not passed to setInterval, and it's not finished until "end()" is called (or the timeout reached).

sladiri commented 8 years ago

Of course, and I believe that the language itself provides solutions for this and that it is not tape's responsibility to solve this problem, that is all.

ArtemGovorov commented 8 years ago

@sladiri

Your example without plan() does the following in Promise-terms, where the order of execution is not deterministic (which is fine if there is no shared mutable context).

// pseudo code
Promise.all([ test1, test2, test3 ])

I don't think it's how most of test frameworks I know (including tape) work.

This code works as I expect and executes the first test before the third test:

var test = require('tape');

test('first', function (t) {

  setTimeout(function () {
    t.ok(1, 'first test');
    t.end();
  }, 200);
});

test('third', function (t) {
  t.ok(1, 'third test');
  t.end();
});

No matter how long does the first test take to execute, the third one will be executed after it. It's when I insert the second subtest in the first test, it starts to show the unexpected behaviour.

kl0tl commented 8 years ago

Quick followup after my last post. Listening to beforeExit instead of calling nextTick (https://github.com/Kl0tl/tape/commit/099488d054b2ce1041472acdaae2dbd2daa4e686) yields the expected order but causes nested-sync-noplan-noend to hang. I’m not really sure why.

I played a bit with Angular zone.js library. It works but it breaks even more tests :’D

It’s interesting to note that this simply doesn’t happen with blue-tape and Async Functions. It seems to be the most elegant solution to this issue. Could it be enough to document this use case and educate users ?

kl0tl commented 8 years ago

I fixed the failing test, listening to beforeExit should be enough to work around this issue !

kl0tl commented 8 years ago

Oh well, neither beforeExit nor timers unref() method work inside a browser environment. Looks like zones are the only way to preserve this behavior.

sladiri commented 8 years ago

It’s interesting to note that this simply doesn’t happen with blue-tape and Async Functions. It seems to be the most elegant solution to this issue. Could it be enough to document this use case and educate users ?

I have some examples using blue-tape. I like blue-tape because it makes it easier to use Promises, and generators. It does not directly accept generator functions, that would be great as well. I would like that blue-tape's functionality and more gets into tape, but I guess that it is a bit early ~~~since the spec is not standard yet~~~.

ljharb commented 8 years ago

@sladiri Promises and generators have been standard since June 2015

sladiri commented 8 years ago

@ljharb, thank you. I mixed that up ~~~with ES6 syntax~~~.

I tried to create an example on how one could use Promises to control the order of execution. https://github.com/substack/tape/pull/257

More complex examplse may be incompatible with reporting and it does discourage nested tests, doesn't it?

ljharb commented 8 years ago

@sladiri to clarify, ES6/ES2015 is what was standardized June 2015, which includes Promises and Generators.

davegomez commented 8 years ago

Any hope on fixing this?

I am struck by the lack of interest from @substack

kl0tl commented 8 years ago

@ArtemGovorov Is this something you absolutely need fixed to integrate Wallaby with tape ?

There’s a handful of choices if you can’t workaround this bug:

Document this issue and encourage users to use plan or prefer blue-tape and Async Functions if available should they ever need an asynchronous test with children
Deprecate the implicit termination of parents tests and remove it in a future release
Fix the damn bug with zone.js (or something similar) but it feels a bit intrusive to basically monkey patch every asynchronous API (could be interesting to expose a browser only version of tape through the browser field of package.json to be able to solve this in a simpler way on Node.js)

Did I miss anything ? @ljharb @Raynos

ArtemGovorov commented 8 years ago

@Kl0tl I have only been working on integrating node version so far, not the browser one, so would be great if it was fixed at least for node. The distilled issue itself is just a tip of an iceberg of various things that can go wrong because of it. I would really like to avoid having to support a half-working version of the integration in wallaby and spend a lot of time trying to figure out if it's something I missed in the integration or if it is a result of this issue.

ljharb commented 8 years ago

Zones are not a standard (merely a stage 0 proposal) and are a non-option. Anything that monkeypatches anything in a reusable module is also a non-option, ever.
Documentation is great - a PR to update the readme would be welcome. It can always be removed when/if it is fixed.
any fix must work in both browsers and node
This is a bug that we all agree needs to be fixed. PRs welcome!

sladiri commented 8 years ago

I tried to create example which shows how to test ~~~a Koa.js server~~~ with blue-tape and promises. https://github.com/substack/tape/pull/259

kl0tl commented 8 years ago

@ljharb Would a fix for Node only and an update to package.json with a browser field exporting the Iibrary without the fix ok? As far as I understand there is no way to fix this in a browser environement currently :( PR incoming !

@ArtemGovorov In the meantime you can try with github:kl0tl/tape#beforeExit instead of tape, it should be fixed there.

ljharb commented 8 years ago

@Kl0tl I'd prefer the same bugs be present consistently across every environment. One of the primary use cases for tape is testing things in both browsers and node simultaneously, and differences across the two will be problematic.

kl0tl commented 8 years ago

@ljharb Yeah, consistency wins you’re right. I’ll document this behavior inside a new #Troubleshooting section of the readme.

What do you think about bringing Promises support into core now they’re standard ? I know there’s blue-tape but Async Functions are very convenient to use for async tests and obliterate this bug.

@ArtemGovorov Sadly this bug is here to stay :(

I hope it doesn’t prevent Wallaby to integrate with tape ! Would it be enough to encourage users facing this issue to avoid this pattern or use Async Functions if possible ?

ljharb commented 8 years ago

tape mirrors node core's assert - if assert gets a Promise interface, then so too should tape.

kl0tl commented 8 years ago

Promises support would still mirror assert. It’s not about modifying existing assertions but allowing users to return promises from their tests.

ljharb commented 8 years ago

Supporting that is tricky because we'd have to either a) assume Promise was available, which forces the user to know they need to shim it, b) allow explicit injection of Promise so they can pass bluebird or native Promises or whatever as they like (like .setPromiseConstructor() or something), or c) add a Promise lib as a dependency.

Solution a is dangerous, b is awkward, and c is heavyweight. If you want to open a new thread to discuss it (not on this one) we can, but I don't think it's likely.

kl0tl commented 8 years ago

Discussion on promises moved to https://github.com/substack/tape/issues/262.

kl0tl commented 8 years ago

I finally took time to document the issue!

It seems this bug can’t be fixed reliably across node and browsers :( Would it be conceivable to deprecate the implicit termination of parents tests and remove it in a future release? If not there’s not much more to do regarding this.

medikoo commented 7 years ago

If I understand problem correctly there are only two possible fixes:

A. Resign from support of no plan and no end test variants (as tested here) alternatively such behavior could be supported via extra option or alternative function.

B. Tweak setTimeout for time-being of callback being called, so internals have capability to track remaining callbacks, therefore wait for them before moving forward.

In my eyes A is cleaner, still I wonder how widely developers rely on that. If it's a no go, then it might be not solvable. B is really dirty, and one of main reasons that devs rely on Tape is because it doesn't mess with global state.

@ArtemGovorov can you explain why exactly it's a blocker for Wallaby? Aside of this issue, I can imagine some race conditions in asynchronous tests which will still make asserts to not go down always in same order (and it could be seen as acceptable).

nhamer commented 7 years ago

The problem isn't that 'first' actually starts terminating after 'third', it's that the entire block in setTimeout runs after 'first' has ended (implicitly, due to subtest without plan as @Kl0tl noted)

test('implicitEnd', function(t) {
   setTimeout(function() {
     t.ok(!t.ended); // fails - t has already implicitly ended
     t.end(); // no failure, as there would be with multiple ends, but completely unnecessary
     t.ok(true, 'assert after end'); // succeeds, and shouldn't because t has ended 
   }, 200);
   t.test('subtest', function(t2) {
     t2.ok(true, 'subtest assertion succeeds');
     // t.ended === undefined
     t2.end(); // implicitly ends t
     // t.ended === true
   });
});

this is basically just https://github.com/substack/tape/issues/264

nhamer commented 7 years ago

This behavior was introduced in https://github.com/substack/tape/commit/35f29a as part of the fix for https://github.com/substack/tape/issues/33

I suggest that https://github.com/Kl0tl/tape/blob/master/test/nested-sync-noplan-noend.js#L1 should hang, since it doesn't have have a plan or end, which violates the documented behavior "If you forget to t.plan() out how many assertions you are going to run and you don't call t.end() explicitly, your test will hang."

I think the right fix here is to end the test upon subtest completion, but only if t.end() has been called already and conversely defer t.end() if subtests are queued. createHarness needs an option autoEnd for the exported harness to call t.end() after all subtests.

ljharb commented 5 years ago

Unfortunately the fix to this (#403) had to be reverted, since it caused #459.

@nhamer, it'd be great to try to address this again, but taking into account #459.

tape-testing / tape

Test order is affected by a presence of a nested test #222