Cross-Runtime Testing APIs?

karlhorky commented 9 months ago

Since Node.js, Bun and Deno offer different APIs (node:test, bun:test, Deno.test) but have many similarities, would it make sense to have a set of specced cross-platform testing APIs?

Motivating Example

runtime: prefix, in the style of Node.js or Bun:

import { test } from 'runtime:test';

test('2 + 2', () => {
  expect(2 + 2).toBe(4);
});

Or, Deno-style global:

Runtime.test("assert works correctly", () => {
  assert(true);
  assertEquals(1, 1);
});

// Using existing global:
globalThis.test("assert works correctly", () => {
  assert(true);
  assertEquals(1, 1);
});

This was originally inspired by @nzakas's tweet here:

What I want: To write JavaScript tests once and be able to run them across Node.js, Bun, and Deno.

Problem: Bun and Deno have built-in test runners you have to import from to run tests. I use Mocha. This doesn't work.

Solution: ???

Source: https://twitter.com/slicknet/status/1762264774166085937

@CanadaHonk rightly mentions that, while there are similarities, there are probably challenges creating a spec:

most test framework APIs are pretty similar?

probably hard to spec due to sandbox/isolation/etc

Source: https://twitter.com/CanadaHonk/status/1762417370893516929

Alternatives Considered

Separate Package

Instead of something built into the standard library of multiple runtimes, use a separate package (either like Vitest, including testing features of its own, or a wrapper package).

Downsides:

Doesn't work out of the box or with zero dependencies
Downloading and maintaining another package in devDependencies

voxpelli commented 9 months ago

Why not use a test library instead then?

ekwoka commented 9 months ago

I think all the things you could definitely get them to agree on they already agree on.

It's the plugins and extensions that they might disagree on.

And you can use something like vitest.

While it's cool to have the runtime also have a test runner, it's not exactly in the scope of a runtime, is it? 🤔 Should package management also be prescribed?

pi0 commented 9 months ago

If someone wanted to try something like this as a package, it is more than welcome to be part of unjs.io (cross linked to https://github.com/unjs/community/discussions/2) ❤️

Just reminded i got untest pkg with this idea 2 years ago and forgot to even push my progress since Vitest was born and it was amazing! I think main limitation is that we need to split the standard assertion library from runtime logic to make this happen.

ljharb commented 9 months ago

Since the ecosystem hasn’t ever come close to cohering on a standard pattern in libraries, i don’t think it makes sense to standardize one. Platforms choosing to offer a test framework despite this lack of coherence shouldn’t justify continued artificial forcing of patterns.

pi0 commented 9 months ago

@ljharb Are you aware of any previous efforts that particular runtimes (Node, Bun and Deno) explicitly disagreed with a unified API for testing?

It is not an artificial need that today we cannot test a library against all WinterCG runtimes. (and it is growing, llrt made one for themselves, based on node-assert and jest-like runtime).

pi0 commented 9 months ago

Also looking at the readme:

How are the APIs selected By looking at the APIs that are already implemented and supported in common across Node.js, Deno, and Cloudflare Workers. If there were at least two implementations among those -- either already supported or in progress -- then it was added to the list.

I think this at least met the 2/3 criteria of this repo (or i might be totally wrong)

ljharb commented 9 months ago

@pi0 the issue isn’t disagreement, it’s artificially forcing into a standard something that never went through rigorous intentional design nor got extensive userland usage prior to implementation.

pi0 commented 9 months ago

I always imagined API Standards are specifically being designed to be strict and purposed to be consistent and also imagined that WinterCG is a place to discuss about designing such (common) APIs to be consistent.

Would it still be something (in vision of @wintercg group) to be continued perhaps as a new proposal (not common-api but like proposal-testing-api)?

Qard commented 9 months ago

I don't think the test framework should be standardized, but I do think the lower-level machinery around it probably should be. Like a core assert(...) interface which could be wrapped to make the prettier variations like expect(...) with its chainability. I also think we should probably standardize a sandbox at the single test unit level and allow frameworks to compose and rearrange them as they see fit. Should probably also have a standardized functional output for that single unit of work with an interface that receives some event data about that run but not standardization for text output.

Test frameworks have varying structural opinions, but at the core of all of them is the need for a well-isolated sandbox which can intercept and analyze any failure from within. Standardizing that sandbox could be reasonable.

mk-pmb commented 9 months ago

Whatever we standardize here (if anything), please make sure it's easy to test (failing) async functions and promises (including rejection and timeout).

ljharb commented 9 months ago

@Qard that's not how they all work, but it's definitely worth reviewing what someone can come up with - I'd say that lacking ecosystem coherence, any approach that isn't compatible with the top N test frameworks (arbitrarily, 10?) isn't a good idea to move forward with.

ekwoka commented 9 months ago

I do see the benefit of standardizing the assert/expect behaviors/apis, even if other things like describe/it/test would be more up for grabs. As long as it leaves freedom for userland extension.

Qard commented 9 months ago

@ljharb Yep, agreed. We might not be able to make a tool that everyone will want to use, but as long as it satisfies most users it should be fine. Might even be several tools--probably would be, I would think assertions should be able to exist on their own from an error-catching execution sandbox.

A similar case is UrlPattern--not every routing framework will want to use that style, but for many it's enough. Putting aside if the design of that particular API is any good or not, I'm just using it as an example of an API solving a specific use case and not trying to be everything for everyone.

nzakas commented 9 months ago

From my point of view, what I would like is something along the lines of:

import { describe, it, expect } from "std:test";

Where each runtime could define std:test however it wants, or even let the developer define what std:test points to. As @ljharb points out, there's already plenty of alignment on how these things work, it's really just the matter of needing to switch between node:test, bun:test, and deno:test, or other options, that make the dream of writing your JS tests once and running in any runtime a problem.

ljharb commented 9 months ago

Even describe/it aren't universal among testing paradigms - that's just BDD, which isn't the only testing paradigm.

mk-pmb commented 9 months ago

Would it help to always export describe and it, but in engines that don't support them, they're false rather than a function?

voxpelli commented 9 months ago

Problem: Bun and Deno have built-in test runners you have to import from to run tests. I use Mocha. This doesn't work.

As one of the new maintainers of Mocha I added an issue that tracks support for Bun / Deno – as so far no such feature request existed in the Mocha project: https://github.com/mochajs/mocha/issues/5108

I think @isaacs reply to @wesleytodd here is great at describing some of the different approaches that makes standardizing hard: https://blog.izs.me/2023/09/software-testing-assertion-styles/

The one thing that has been fairly successfully standardized is the TAP output: https://testanything.org/ (And similarly SARIF for static analysis output: https://sarifweb.azurewebsites.net/)

The problem that today we cannot test a library against all WinterCG runtimes seems like an honorable one – but this proposal is for a specific solution to that problem, I would suggest re-framing.

There exists some prior art in testing things across plenty of runtimes, such as https://github.com/bterlson/eshost (with https://github.com/devsnek/esvu / https://github.com/GoogleChromeLabs/jsvu) but that injects its own set of functions into the runtime rather than relying on them being built in.

Another dimension to this:

This is be the first dev-environment oriented WinterCG proposal? Would all runtimes be expected to include this in their production runtimes as well? Separate test frameworks are rarely included in production builds, but I guess Node, Deno and Bun all are including theirs everywhere. How would possibly including a test framework in production affect the serverless oriented runtimes that are trying to be as slim as possible?

mcollina commented 9 months ago

Unfortunately it’s not just about how the tests are written, but the different execution model makes it very hard to impossible to have the same tests run the same on multiple runtimes. I think this problem is better tackled in the ecosystem rather than wintercg.

(The need for a cross runtime testing solution is there).

ljharb commented 9 months ago

Very well stated @mcollina, i agree - the problem is VERY real, but that doesn't mean a good native solution is possible.

It's very hard for humans to accept that a thing they want isn't something they can have, but I hope we're able to accept that if indeed that's the case here.

mcollina commented 9 months ago

I think that a tool can be written to run tests on all runtimes, but it needs to take into account the different execution models, how they handle TS compilation, if bundling is needed, etc.

Qard commented 9 months ago

I wasn't thinking quite to that level. I was more thinking just an execution sandbox with some domains-like error intercepting functionality to capture any errors thrown, rejections left unhandled, etc. Most of what test frameworks do from a user-facing perspective is just structural organization things, which is fairly trivial compared to the much more complicated internal need of effective sandboxing. If the internal complications are mostly handled by some existing standard then it would be fairly easy for people to make their own testing frameworks to follow whatever structure they want on top of that.

mk-pmb commented 9 months ago

Revisiting @nzakas's comment

it's really just the matter of needing to switch between node:test, bun:test, and deno:test, or other options,

Is the problem just about the import identifier? Is that something we can solve with import maps?

ljharb commented 9 months ago

too bad it’s not just “test” like every other node core module, or it would be easy :-)

ekwoka commented 9 months ago

I think an exploration of if a runtime agnostic test runner could be made, either tapping into the native ones as the runner core entirely, or as a library that has compatibility with the apis.

This would help identify if/where the different runners meaningfully conflict and which are actually shared

pi0 commented 9 months ago

@voxpelli This is the first dev-environment-oriented WinterCG proposal. Would all runtimes be expected to include this in their production runtimes as well? Separate test frameworks are rarely included in production builds, but I guess Node, Deno, and Bun all include theirs everywhere. How would possibly including a test framework in production affect the serverless-oriented runtimes that are trying to be as slim as possible?

Yes, I guess this is the first time we are thinking of something related to the dev side as well here.

At least I'm considering introducing production tests API for nitro. Today, there is also no easy way to cross-runtime test deployment targets locally. I see even a minimal spec that introduce in WinterCG to be usable in both dev/runtime cases. (thanks for write-up and all you are doing btw!)

@mcollina Unfortunately it’s not just about how the tests are written, but the different execution model makes it very hard to impossible to have the same tests run the same on multiple runtimes.

Could we think of WinterCG's scope to define how tests to be written to make them "runnable" in common runtimes? A subset, minimal and optional suggestion RFC that implementations can follow as a commonground recommendation and reduce furthor divergance.

isaacs commented 9 months ago

I am not so sure that it is just a question of switching the import specifier. The semantics do actually matter, especially when it concerns asynchronous throws, test parallelism, and the correspondence of output to input.

It would be possible, of course, to fully specify all of these semantics such that browsers and server runtimes could expose a test import with the mocha interface, but I believe it would be a mistake to just assume that because mocha exists and runtimes provide things with these names already, that such specification is unnecessary.

If we look at language runtimes that have broad consensus about test primitives, then what I called "tap style" (similar to the junit approach in .NET and Java, albeit with xml instead of TAP) has a lot of advantages, since it carves the problem up into more or less discrete smaller problems:

The specification of the protocol itself (largely accomplished by the TAP specification, though it would be worthwhile to pursue consistency on the specific fields present in diagnostic metadata; junit xml would be another choice, but has a similar "escape hatch" that would benefit from some more explicit metadata conventions)
The specification of language features that generate given protocol outputs. Each test primitive (whether an object with t.plan(), t.assert(), etc., or mocha BDD style describe/it methods) can be specified in terms of the protocol elements it generates.
The display, collation, and other analysis of test results can be specified in terms of how protocol elements are parsed, consumed, and presented.

From a more meta point of view, I don't think this is (yet) a good fit for WinterCG. There has been ample exploration of the space in userland over the years, and it's clear that there are multiple ways to skin this cat, and that people do not at all agree on the best way to do it. There is no web/browser/language standard on how to do it. So, unlike for example the shape of and behavior of Request/Response objects, where WinterCG can take existing web standards and JS language concepts and adapt them in a straightforward way to server use cases, this would be taking existing serverside JS userland concepts (which in some cases are not semantically equivalent to similar features in browser JavaScript), and blessing them as a "specification".

While the risks are no doubt surmountable, this raises the following concerns:

Devaluing the quality of WinterCG's blessing, if it is blessed prematurely.
Conflicting with a future specification that does cover both browser and server JS use cases.
Privileging the concerns of one server JS runtime over another, without adequately considering why another run time did it differently. (This can happen especially easily if the specification is insufficiently precise, and says "do it like how node does it" or something, resulting in a case where one runtime is the de facto standard, and all others need to copy their bugs, which get frozen permanently.)
Privileging the preferences of some group of users over another, without adequately considering the tradeoffs or reasons for choosing differently.

ljharb commented 9 months ago

@isaacs oh i just meant the specific problem referenced here would be easier; obv it's as complex as you describe in actuality.

I completely agree with your stated position here; thanks for stating it much more eloquently than i did.

wintercg / proposal-minimum-common-api