Allow for tests nested within tests

Krinkle commented 3 years ago

Allow tests to directly contain both assertions and other tests.

Per https://github.com/js-reporters/js-reporters/issues/117#issuecomment-713903902 this is required for node-tap and tape. This does limit a bit how reporters can visualise and lay out the data, but seems worth doing. Especially as it aligns us closer with TAP.

For HTML reporters like QUnit that provide a collapsible list of assertions for each, they may need to buffer each test and then render the list of assertions first, followed by the sub tests. This is a small price to pay for widening up the surface of test frameworks and reporters that can participate. It also wouldn't negatively affect any reporters that exist today since those are currently specific to frameworks frameworks that would never excercise that need for buffering, so it's only a win-win to allow the reporter to be user more widely.

Consider phasing out the "Suite" concept. Most likely obsolete after this.

Krinkle commented 3 years ago

Requiring consumers to support assertions and nested tests under the same concept does limit the choices a reporter can make for their visual or textual layout. For assertions to appear between tests, and tests between assertions.

I realize this may insignificant from a TAP perspective. From what I've seen in the TAP community, frameworks are generally exception-based, runners stop after the first error, and reporters only communicate tests and failed assertions. By not rendering multiple errors at different levels, and by not making passed assertions accessible, this means one conveniently sidesteps this issue. QUnit is not exception-based, may provide multiple errors from a single run, and does make its passed assertions accessible via a click or flag. Below is an example of its HTML report, with two of the tests expanded to show its assertions:

I suppose individual consumers wishing to render things this way could workaround this limitation by buffering any nested tests, and rendering the direct assertions first. This would slightly misrepresent the true data, but seems and is no worse than what they can do today. It just means they can be used by a wider range of test frameworks.

Krinkle commented 3 years ago

I'm running into a few issues when trying to merge the "suite" and "test" concepts.

The simplest approach, I thought, would be to simply convert suites to tests and call it a day. However, this is proving to be more difficult than I thought. Our TAP reporter, ignores "suite" events. It only acknowledges suite as a concept by means of prefixing the test names (which are provided as part of the test name, so technically it's entirely unaware of suites).

Before events:

runStart: (total: 2)
suiteStart: "some suite"
testStart: "test A" (parent: "some suite")
testEnd: "test A" (parent: "some suite", status: passing)
testStart: "test B" (parent: "some suite")
testEnd: "test B" (parent: "some suite", status: passing)
suiteEnd: "some suite" (status: passing)
runEnd: (total: 2, passing: 2, failing: 0, skipped: 0, todo: 0)

Before TAP:

ok 1 some suite > test A
ok 2 some suite > test B
1..2

When we alllow nesting in tests, and emit existing suites as tests, we see two potentially unexpected side-effects. Firstly, each suite now counts as a test, so the number of tests is higher, even if all these former-suites don't and can't have assertions. That's fine I suppose. Secondly, it means we are outputting them after their children because TAP is based on when a test result has come in, and naturally children end before their parent.

Possible future, events:

runStart: (total: 3)
testStart: "some suite"
testStart: "test A" (parent: "some suite")
testEnd: "test A" (parent: "some suite", status: passing)
testStart: "test B" (parent: "some suite")
testEnd: "test B" (parent: "some suite", status: passing)
testEnd: "some suite" (status: passing)
runEnd: (total: 3, passing: 3, failing: 0, skipped: 0, todo: 0)

Possible future, TAP:

ok 1 some suite > test A
ok 2 some suite > test B
ok 3 some suite
1..3

We already knew that the TAP spec doesn't (yet) have a standard for sub tests (ref https://github.com/TestAnything/Specification/issues/2), but there's a couple of (mostly back-compat) ways this is done today by node-tap, and its tap-parser. It provides a child event, which consumers can use to mark sub tests in some way, e.g. by indenting, prefixing or otherwise wrapping the inner tests.

This seems analogous to the "suite" events js-reporters have today, so maybe we shouldn't be merging these concepts after all. Rather we just need to add support for nesting tests.

We can continue to provide "suite" as a way of transparently grouping tests. Individual test frameworks and adapters don't have to use these of course. If their grouping unit likely to have assertions directlyh in it (not bail outs, but regular failures) then it might want to use "test" for both the group and the unit, but for transparent grouping of tests we can continue to provide "suite".

Would that make sense?

Krinkle commented 3 years ago

node-tap, for comparison:

const tap = require('tap');

tap.test('foo', (t) => {
    t.test('bar', (t) => {
        t.end();
    });
    t.end();
});

$ node_modules/.bin/tap -R tap tmp.js
TAP version 13
ok 1 - tmp.js # time=28.006ms {
    # Subtest: foo
        # Subtest: bar
            1..0
        ok 1 - bar # time=2.908ms

        1..1
    ok 1 - foo # time=19.055ms

    1..1
    # time=28.006ms
}

1..1
# time=5334.686ms

Krinkle commented 3 years ago

Would that make sense?

I'm still interested in hearing other thoughts, but, now that I've done most of the code changes required for this, I'm coming around (once again) to the idea that we don't need suites. We really should just represent suites as tests, I think.

The next thing I'm running into is the status and errors fields for TestEnd. Once a child test has failed, I'm not sure how the enclosing test should behave. It seems intuitive to propagate the error status, surely a parent should not succeed if one of its children is failing, that seems clear enough. But, what about errors? On the one hand it seems odd for producers to have to copy around and propagate these and reporting the same error multiple times. On the other hand, it also seems odd for consumers to deal with a test that has status: failed and not have an error object to explain the error.

Should we consider it normal for a test to be failed and yet have no errors? We woudl trust the producer (test framework/adapter) to have emitted it before if it was from a child, and thus consumers need to get used to that and present it in a way that makes sense. E.g. they can no longer say "it failed and here is why". Even some kind of rich IDE wouldn't be able to correlate the two with high confidence, if e.g. one clicks on the parent test, beyond e.g. falling back to showing its child test errors and hoping they are indeed the reason. I guess that makes sense. We might want to formalise it in the spec (possibly non-normative) that producers should only ever omit errors on a failing test if it failed due to a child test.

qunitjs / js-reporters

Allow for tests nested within tests #126