Test count may not be as useful as it could be

willm commented 2 years ago

Version

v18.3.0

Platform

Darwin willmunn-2 20.6.0 Darwin Kernel Version 20.6.0: Tue Feb 22 21:10:41 PST 2022; root:xnu-7195.141.26~1/RELEASE_X86_64 x86_64

Subsystem

test_runner

What steps will reproduce the bug?

I wanted to try the new test runner, being a long term tape user, this was pretty exciting as the apis are very similar. It seems the test summary at the end of the tap output is recording test counts as the number of test files. I had a go at a simple fizz buzz implementation:

import test from 'node:test'; import assert from 'assert';

import test from 'node:test';
import assert from 'assert';

const fizzbuzz = (num) => {
  const special = [
    {condition: num % 3 === 0, output: 'fizz'},
    {condition: num % 5 === 0, output: 'buzz'},
  ];
  const specialAnswer = special.reduce((output, x) => x.condition ? output + x.output : output, '');
  return specialAnswer ? specialAnswer : num;
};

test('1 should return 1', t => {
  assert.equal(fizzbuzz(1), 1);
});

test('3 should return fizz', t => {
  assert.equal(fizzbuzz(3), 'fizz');
});

test('5 should return buzz', t => {
  assert.equal(fizzbuzz(5), 'buzz');
});

test('15 should return fizzbuzz', t => {
  assert.equal(fizzbuzz(5), 'buzz');
});

test('16 should return 16', t => {
  assert.equal(fizzbuzz(16), 16);
});

node --test returns the following output:

TAP version 13
ok 1 - /Users/will.munn/code/experiments/node18/tests/test.js
  ---
  duration_ms: 0.07915145
  ...
1..1
# tests 1
# pass 1
# fail 0
# skipped 0
# todo 0
# duration_ms 0.126770722

note that the the output suggests that only 1 test has been run. After adding another test file, I noticed that the output is actually counting the number of files, not the number of test() blocks or the amount of assertions.

If I add a test.skip to one of the tests, it will still report that there are 0 skipped tests.

How often does it reproduce? Is there a required condition?

No response

What is the expected behavior?

I personally think the best solution would be that The TAP output reports the number of test() blocks rather that the number of test files. So for my example:

TAP version 13
ok 1 - /Users/will.munn/code/experiments/node18/tests/test.js
  ---
  duration_ms: 0.07915145
  ...
1..5
# tests 5
# pass 5
# fail 0
# skipped 0
# todo 0
# duration_ms

What do you see instead?

TAP version 13
ok 1 - /Users/will.munn/code/experiments/node18/tests/test.js
  ---
  duration_ms: 0.07915145
  ...
1..1
# tests 1
# pass 1
# fail 0
# skipped 0
# todo 0
# duration_ms 0.126770722

Additional information

Note that my suggestion is actually different to what the tape module does, this reports counts based on the number of assertions. I personally feel that number of test blocks makes the most sense.

tniessen commented 2 years ago

cc @nodejs/test_runner @cjihrig

cjihrig commented 2 years ago

This could be improved by parsing the standard output of the child processes that run each test file. There is a TODO in the code regarding implementing a TAP parser. IMO that is the best approach, but also requires the most work. If someone really wanted to, they could implement more light weight parsing that, for example, only parses the ending summary lines of each child process.

manekinekko commented 2 years ago

This could be improved by parsing the standard output of the child processes that run each test file. There is a TODO in the code regarding implementing a TAP parser.

@cjihrig happy to work on this. Could you point me to the TODO comment?

aduh95 commented 2 years ago

@manekinekko https://github.com/nodejs/node/blob/adaf60240559ffb58636130950262ee3237b7a41/lib/internal/main/test_runner.js#L108-L109

manekinekko commented 2 years ago

Thank you @aduh95. I'm gonna work on this 👍

MoLow commented 2 years ago

I took a little look at this issue, and I was wondering if this can be approached by adding support for additional reporters other than the current TapStream, this will allow adding a plain JSON reporter that is already very easy to serialize/deserialize possibly another reporter can be using worker_threads postMessage to share data between the parent test and its children

this approach can also enable passing custom reporters that are implemented in userland as an option to the root test runner (assuming we define a common interface for a reporter) WDYT?

manekinekko commented 2 years ago

I agree with @MoLow!

this approach can also enable passing custom reporters that are implemented in userland as an option to the root test runner (assuming we define a common interface for a reporter)

I am in favor of this as well.

MoLow commented 2 years ago

@cjihrig I'd love to hear your feedback regarding the reporter's approach. @manekinekko would you like to implement some kind of parentPort.postMessage reporter?

cjihrig commented 2 years ago

I took a little look at this issue, and I was wondering if this can be approached by adding support for additional reporters other than the current TapStream

@cjihrig I'd love to hear your feedback regarding the reporter's approach.

In my head I always pictured the test runner outputting TAP via a stream and other reporters being implemented as transform streams (this part kind of depends on having the TAP parser in place).

manekinekko commented 2 years ago

would you like to implement some kind of parentPort.postMessage reporter?

@MoLow I am happy to give it a shot. Could you provide a high-level design so I have a bigger picture of the different pieces?

In my head I always pictured the test runner outputting TAP via a stream and other reporters being implemented as transform streams (this part kind of depends on having the TAP parser in place).

This would totally be doable once the TAP parser is done. The current AST has all the info to run it through a codegen and spit out virtually any format.

MoLow commented 2 years ago

In my head I always pictured the test runner outputting TAP via a stream and other reporters being implemented as transform streams (this part kind of depends on having the TAP parser in place).

I totally understand why implementing reporters as a transform stream makes sense,

my two arguments don't necessarily contradict that:

JSON is much more natural in javascript environments than TAP and can already be de/serealized out of the box, so I question TAP being the default output for a test? I assume that it can be very hard to write a TAP parser that beats JSON.parse, both in terms of performance and of usability. IMHO it makes more sense that the base stream outputs JSON objects and then use a transform stream to output TAP (in case one wants tap output and not something else)
the implementation of --test might leverage the ability to share memory between workers - which might be a little more efficient than serializing -> parsing -> serializing again I am not sure about this one as multiple child processes can leverage multiple CPUs, and this wont allow that

ljharb commented 2 years ago

Test runners in the ecosystem don't output JSON by default - ever, afaik. A number of them output TAP by default, though. TAP is the best choice.

cjihrig commented 2 years ago

I assume that it can be very hard to write a TAP parser that beats JSON.parse

I also think that we need to think in terms of streaming output, which JSON.parse() would not handle. Streaming JSON parsing is possible, but I still think TAP is the right choice for the default output format.

MoLow commented 2 years ago

ok, I've got your point :)

kmannislands commented 2 years ago

I've been experimenting with the new node:test API and I had the same thoughts as @MoLow re: JSON.

Going with TAP as the native format seems to introduce unnecessary friction for consuming tools and it sounds like internally within node:test as well.

In order to make sense of the test results of a file's test run, you need a TAP parser. In order to implement a proper TAP parser, you need a YAML parser. And parsing YAML correctly is very non trivial.

In implementing a TAP parser myself, I've shied away from full YAML parsing and decided to implement a parser for the subset of yaml that jsToYaml can actually produce at the moment. But I can't help but feel like something is off here. YAMLs a complex format that's rather foreign in the node/js context. It seems node core would need a full yaml parser/encoder sub-lib in core to make this futureproof and that seems like quite a lift.

I also think that we need to think in terms of streaming output, which JSON.parse() would not handle. Streaming JSON parsing is possible, but I still think TAP is the right choice for the default output format.

Sure streaming JSON, but is that really the problem space here? I would imagine ndjson could be used with newlines delimiting results.

For a stream emitting something like:

{ testPlan: { start: 1, through: 2 } }
{ testNumber: 1, test: 'ok',  durationMs: 50.12312 }
{ testNumber: 2, test: 'not ok', durationMs: 50.12312, diagnostic: 'TODO # reason',  failureType: 'testCodeFailure' }

could be parsed pretty trivially with built-in functions (no 10k+ LOC yaml parsing lib dependency) as a stream:

const testNdJSONStream = run();

export async function* readableStreamLines(
    readableStream: NodeJS.ReadableStream
): AsyncGenerator<string> {
    for await (const tapBytes of readableStream) {
        if (!Buffer.isBuffer(tapBytes)) {
            throw new Error(
                `Didn't receive of bytes from TAPStream as expected`
            );
        }
        yield tapBytes.toString('utf-8');
    }
}

for await (const resultLine of readableStreamLines(testNdJSONStream)) {
  const testResult = JSON.parse(resultLine)
}

kmannislands commented 2 years ago

Another idea I had when considering all of this, node could potentially respect the TAP 13/14 spec while side-stepping the YAML parsing problem.

TAP 13 spec basically just specifies that the contents between --- and ... are valid YAML and not much else

Currently (2007/03/17) the format of the data structure represented by a YAML block has not been standardized.

What if node:test serializes contents in this block to a subset of flow style yaml instead of the subset of block style YAML that is presently being used?

This subset of flow style could also be valid JSON meaning that node:test would be outputting valid TAP 13 per the spec while still being easily parseable from js without a yaml lib.

Something like:

TAP version 13
# Subtest: /foo/bar/combinators.test.js
not ok 1 - Static HTML JSX Rendering
  ---
  { "duration_ms": 4.15925,  "failureType": 'subtestsFailed', "error": '1 subtest failed', "code": 'ERR_TEST_FAILURE' }
  ...
1..1

This is valid TAP 13, the embedded doc is valid YAML and it is far friendlier to parse from node.js.

MoLow commented 1 year ago

completed via https://github.com/nodejs/node/pull/43525

nodejs / node