qunitjs / js-reporters

📋 Common Reporter Interface (CRI) for JavaScript testing frameworks.
MIT License
59 stars 18 forks source link

Request for comment: CRI Version 1 #117

Closed Krinkle closed 3 years ago

Krinkle commented 3 years ago

I've turned everything we have so far into a draft specification: https://github.com/js-reporters/js-reporters/blob/main/spec/cri-draft.adoc

To gain experience, it is currently used in stable releases or production versions of:

Objective

To publish Version 1 of the specification "soon".

This ticket is to track the review round during which we can answer any pending questions or past issues that we want to solve as part of Version 1. I think this area is interesting enough that there is definitely space for lots of ideas and future considerations. But, to deliver a long-term and stable Version 1, I think we need to limit our scope through agreed-upn criteria.

Proposed criteria:

These are admitedly quite strict and narrow. Maybe they're feasible, but if not feel free to suggest changes and widening. It's only meant as a way to start and focus the conversation.

Open questions

isaacs commented 3 years ago

Some challenges with this for node-tap:

Krinkle commented 3 years ago

Interesting, I had not considered those as similar, but you're right. I'd support merging the "test" and "suite" concepts. It might require some compromises, but I think it's worth giving a try.

Currently:

I have not previously seen testing frameworks that allow nesting within their equivalent of the "test" concept. I see that node-tap does allow this. As such, node-tap's tests are effectively a hybrid. Like a "suite" that can also directly contain assertions.

The main issue I see with merging these concepts is that it might affect support for "planning" the execution of a run. Right now, we specify a structure that reporters can read that describes how many tests will execute and in which suites they are grouped, before execution begins.

@isaacs How does this work with node-tap? The TAP spec defines a "plan" (spec), which Mocha, QUnit, and Jasmine effectively implement by waiting until all test files have been loaded, seeing what "suites" have been called, but not actually executing the "test" portions yet. It will then know this information, and then can start execution.

I'm generally in favour of removing the structured plan (e.g. the whole suite structure) since this is afaik never used in practice by reporters. A stream-based approach seems preferable there. However an overall count of planned tests seems useful to keep, allowing reporters to track and communicate a sense of progress toward completion.

ljharb commented 3 years ago

ftr, tape also allows nesting tests (groups of assertions), and has no concept of "suites". In tape, you can t.plan(n), but you can also call t.end() or return a promise, because no plan is needed in a synchronous test, or when the promise can signal completion.

In other words, tape simply doesn't have a full count of tests until after the full run is completed.

isaacs commented 3 years ago

Yeah, tape is sort of in the TAP family of test paradigms (along with ava, to some extent), so most of what's said about node-tap would apply to tape as well.

How does this work with node-tap? The TAP spec defines a "plan" (spec), which Mocha, QUnit, and Jasmine effectively implement by waiting until all test files have been loaded, seeing what "suites" have been called, but not actually executing the "test" portions yet. It will then know this information, and then can start execution.

The tap runner does not load all the test files ahead of time. All tests are run as subprocesses to ensure a clean environment, and their TAP stream is consumed from the child process stdout. A pool of tests are loaded in parallel, by default up to the count of CPUs on the system. As tests finish, new ones are kicked off, but the output order is always consistent (if you specify -R tap, at least).

Within a given test file (ie, in the child process), the object returned by require('tap') has a t.plan() method which can be used to specify a number of top-level tests to expect, but the default behavior is to end automatically once all tests are completed and the process ends.

tap.Test objects (including the root test object returned by require('tap')) are created by calling t.test(name, (t) => ...) and passed as an argument to the function. The function can either call t.end() explicitly at some point, use t.plan() to specify a number of test points to expect, or return a Promise. They can also create subtests, which can create subtests, etc, to any arbitrary depth.

The top-level tap test object doesn't have to create subtests. This is a perfectly valid test script in node-tap:

const t = require('node-tap')
t.pass('this is fine')

If they call t.end() synchronously, or specify a plan which is immediately resolved, tests are run in a single tick, immediately moving on to the next test. So for example:

// all console.error calls will happen in order
const t = require('tap')
console.error('one')
t.test('child test', t => {
  console.error('two')
  t.pass('okie dokie')
  console.error('three')
  t.end()
  console.error('four')
})
console.error('five')
t.test('second child', t => {
  console.error('six')
  t.pass('fine')
  console.error('seven')
  t.end()
  console.error('eight')
})
console.error('nine')

Will produce the output:

$ node t.js
one
TAP version 13
two
# Subtest: child test
    ok 1 - okie dokie
three
    1..1
four
ok 1 - child test # time=2.915ms

five
six
# Subtest: second child
    ok 1 - fine
seven
    1..1
eight
ok 2 - second child # time=0.761ms

nine
1..2
# time=9.445ms

The runner works by getting a list of all the test files that it needs to run, and then calling t.spawn() to run them all from the root tap object (the same object that test files, albeit in a different process). If a reporter is used (other than -R tap of course) then the output of that root tap object is sent to the reporter rather than dumping to stdout. (-j1 means "only one job, don't do the buffering/parallelization stuff", trying to keep it simple here.)

$ tap -Rtap t.js -j1
TAP version 13
one
# Subtest: t.js
two
three
    # Subtest: child test
        ok 1 - okie dokie
        1..1
four
five
    ok 1 - child test # time=2.849ms

six
seven
eight
    # Subtest: second child
        ok 1 - fine
nine
        1..1
    ok 2 - second child # time=0.629ms

    1..2
    # time=9.076ms
ok 1 - t.js # time=9.076ms

1..1
# time=413.414ms

So, really, there is no difference between a "suite" and a "test", we don't often know ahead of time how many tests there will be, and tests can have subtests so really either there's no such thing as a "suite" or it's suites all the way down, and the objects that this spec talks about emitting events for one another are all in different process memory spaces.

In the default reporter, I do report the number of "suites", but that's just the number of test files, which is an arbitrary distinction to draw. All the communication happens over text streams that speak and parse TAP.

Krinkle commented 3 years ago

@isaacs Thank you, that helps!

JS API

The specification seems to focus on a JS api with events passing between a "runner" object and "suite" or "test" objects.

The CRI spec isn't meant to imply JavaScript. The repo's name and heritage do come from that background, though, that's fair. It's important that we (at least) succeed in the JS space to unify the many non-interopable test frameworks, runners, and reporters. But, like TAP, I think CRI could be environment agnostic, specifying only a stream of information.

Join forces?

TAP focusses on the transmission protocol for the (virtual) wire, CRI focusses more on how the data gets on and off that same wire.

In TAP, the line protocol is concrete and the transmission method is left unspecified. In practice, I believe one generally expects to see separate proceses connected through a Unix pipe. When things are not inter-process, though, one could presumably agree unofifically on exposing the runner's writable stream, and exposing the reporter's consumer of the readable stream, and plug-in something other than stdin/stdout. But this is not standardised. CRI aims to close that gap.

I love the Unix philosophy, and this makes perfect sense for Node.js and other CLI contexts, especially when there's a single test run, and a single reporter. So far, I've not had a good sense of how this can be made to work outside that context in a way that is still interoperable. For example:

In the CRI draft we've described what is essentially a callback mechanism or minimal event emitter, with an abstract method that should exist for connecting a reporter (consumer), which then gets called with the described structured data in a language-native fashion.

Once everything is connected, the format of the data is currently a weak point in CRI. Perhaps we should move away from tryingto describe the structure in a language-neutral way and just say that it is "text", e.g. JSON-encoded, or even TAP. This would decoupled CRI and remove any ambiguity about how structures might be represented. if there's more than one way to do so in a given langage. It'd be up to the reporter to decide how they parse the stream.

My dream for CRI is that any test reporter can be declaratively plugged into any test framework or test orchestrator (BrowserStack, Karma, ..). Thus for the many reporters that produce artefact files, upload stuff, pretty-print etc to be re-usable, and for HTML reporters to be re-usable as well.

The below is JS-specific, but as concrete outcome for the JS ecosystem, I consider it sucess if we can:

I haven't considered TAP for this, but I am open to it. I do think we'd most likely still need something within this spec to adress the above, but we could certainly leverage TAP. I think that's worth pursuing. We may need to solve one or two issues upstream in that case, but I wouldn't mind helping with that! E.g. regarding nesting of tests, and (optional) test duration, and details of passed assertions. We might be able to do that as an extension in an ignored comment.

@isaacs I see this that "extended TAP" is the approach node-tap has taken. Could you share a bit about how well that has gone so far, in particular whether there is work underway on potentially formalising a common way for the extended data to be formatted so that e.g. a tap parser would be adopting that and providing it as optional a key-value object to consumers of that parser. Is that a direction that would make sense?

I'm also welcome any thoughts or feedaback on the CRI challenge of connecting consumers to producers.

Misc

Regarding test plans - I didn't realize the test plan "header" could be at the end of a TAP stream. That makes a lot of sense (since TAP 12).

Regarding nested tests - Yep, we need to support this. I consider it a requirement to be able to represent the structure of node-tap and tape with little to no loss in detail/semantics. I filed https://github.com/js-reporters/js-reporters/issues/126 for this.

Krinkle commented 3 years ago

@ljharb I think this is now addressed as part of https://github.com/js-reporters/js-reporters/pull/129. Could you take another look?

ljharb commented 3 years ago

It seems OK, but it's hard to know without some kind of programmatic verification tooling/testing.

Krinkle commented 3 years ago

Thanks for all the input. Continuing the conversation at https://github.com/js-reporters/js-reporters/issues/133. Next I'll see what we can do with Karma runner, BrowserStack, and in-browser reporting from frameworks the don't yet support TAP there today.