Improve organization of unit / integration / system tests

maribethb commented 1 year ago

Background

Tests can ideally be divided into roughly three categories:

Unit tests, that test single functions/methods or perhaps single classes. It is common to stub/mock the dependencies of the code under test in order to isolate it, which helps make the tests less brittle and allows them to execute more quickly.
Integration tests, in which multiple units of code (whole/multiple modules, perhaps) are tested to ensure that interactions between them are correct. Dependencies outside the integration being tested might still be stubbed/mocked, but this is often less necessary since the integration being tested might include most or all of the dependencies.
System tests, in which an entire program/library is being tested, including all its dependencies.

In the case of system tests of a software library like Blockly, it behoves one to test against the API that one is shipping. (If it's also a UI library, then one should additionally also be testing the UI—in our case the DOM elements Blockly creates and manages.) So any kind of system tests we are doing ought to be against the exports of core/blockly.ts, i.e., (historically) Blockly.*.

In the case of unit tests, it is generally preferable to test against only the module containing the code being tested. This enables the unit tests for a single module to be run without having to compile/load any other modules—most test frameworks have the ability to specify exactly what tests should be run, making it convenient to quickly run the tests for a particular module while working on it, without having to repeatedly run the entire test suite. It also allows testing of interfaces that are not exposed by the public API (which for an app, rather than a library, is basically all of them).

(Aside: to facilitate keeping tests and tested code in sync, unit tests are often kept very close to the code being tested. The Go convention is to keep the tests for foobar.go in foobar_test.go in the same directory, with the latter having full namespace access to the former without any explicit, as if it were part of the same file. In scripting languages like Perl and Python, it is not unusual for libraries to include the unit tests in the same file as the code under test, executed when the file is run as a script rather than imported as a module.)

Blockly is in the curious situation of being a library and still maintaining most of its original goog.provides structure. This means that (barring places where we have made changes relatively recently):

We intentionally export most of our internal APIs.
The entire API for some module with ID foo.bar.baz was publicly available via the foo.bar.baz namespace whether we wanted it to be or not. At best we could politely ask that some parts be treated as @private.
Previous to the goog.module migration, there was no way to access this module's API other than via the foo.bar.baz namespace.

So we have a lot of unit tests that are written against Blockly.*, and it's not easy to see the difference between unit, integration, and system tests just by looking at what they import.

More recently, some test modules have been created/modified to directly import submodules. This is some cases necessary in order to test internal-only APIs (or use test-only APIs for mocking/stubbing), but in other cases (as in the code in #7200) this has happened because of some confusion about what the appropriate approach is.

Recommendations

Specifically on the topic of imports, I would recommend that we adopt the convention that unit tests directly import the module containing the code under test, while system tests continue to test against the top-level API (Blockly.*). This should eliminate most situations where a test has reason to both use the Blockly.* namespace and import an individual module. (Integration tests might go either way, though inertia will I suspect tend to favour the status quo.)

I also recommend we:

Learn more about modern best practices for testing TypeScript code and organising test code (especially: outside Google).
Consider reorganising our tests if that will help make their structure more familiar to external contributors.
Consider more clearly separating unit, integration, and system tests.
Consider moving unit tests closer to the code being tested (if there is a reasonably good precedent for doing so in TypeScript projects).
Learn (and document) how to use mocha to quickly run subsets of tests (e.g., unit tests for a single module).
Reorganise our test framework so that at least unit tests, and preferably as many other tests as possible (e.g. generators), can—but do not have to be—be run headlessly in node.js, rather than in a browser.
Reorganise our test framework so that existing unit- and non-unit mocha tests, generator tests, and the new browser tests, or any desired subset thereof, can be run via a single mocha invocation that opens (at most) a single browser instance.

Originally posted by @cpcallen in https://github.com/google/blockly/issues/7224#issuecomment-1612671140

rachel-fenichel commented 1 year ago

Some of the current mocha tests are clearly unit tests and could be (relatively trivially) moved to run in node. An incomplete list:

[ ] Registry (registry_test.js). Mostly checks that when you put things into and out of the registry, it behaves. Does not depend on workspaces or blocks.
[ ] Touch (touch_test.js). Checks that shouldHandleEvent and getTouchIdentifierFromEvent work correctly. Relies on creating PointerEvents, so may not work in node.
[ ] Connection DB (connection_db_test.js). Mocks workspaces to create RenderedConnections, but mostly tests the sort and insertion algorithms.
- This catches errors when we change the insertion/deletion algorithms, e.g. to set up batching to improve performance.
[ ] Field registry (field_registry_test.js). Similar to registry testing, with some extra validation for fields.
[ ] Generator (generator_test.js). Most of the generator testing is done with our actual in-browser generator tests. This test basically checks that you can create an arbitrary new generator and have it call things correctly, rather than checking that the existing generators work for all existing blocks.
[ ] Names (names_test.js). Similar to the generators, for a different part of the generation system.
[ ] Shortcut registry (shortcut_registry_test.js). Similar to previous registry tests.
[ ] Utils (utils_test.js). Mostly checks that regexes and parsing behave.

rachel-fenichel commented 1 year ago

More notes on possible ways to unify tests:

Generator tests are essentially browser tests (they open a page, click around, and look at the results). We could trivially change them to just be another suite in the existing browser tests.
In the opposite direction, generator tests could just be run fully in node.

Not all tests need to be run on every change. Integration tests that take a long time could reasonably be moved to run nightly, which is what we are doing with browser tests. Both generator and advanced compilation tests may fall in this category. The goal of making that change would be to reduce the running time for building and testing during normal development workflows.

google / blockly

Improve organization of unit / integration / system tests #7250

Background

Recommendations