unit tests: enforce time limits

andrewrk commented 4 months ago

I propose two enhancements that, together, are aimed at helping developers keep total unit test times below an acceptable threshold.

The first enhancement is to have each unit test have a time limit. This would default to 1 second, and any test could change its own time limit with API such as std.testing.setCurrentTimeLimit(2.0). When running zig test or zig build there would be a flag to scale this number. For example, --test-timeout-scale=2.0 would give twice as much time to each unit test. Idea here is that you would set this value on hosts significantly slower or faster than normal.

When the time limit of a unit test is exceeded, the build system will kill the misbehaving test and report it as a timeout. zig test would run forever and then report a timeout if it took longer than expected.

The second enhancement is a flag for discovering the relative durations of unit tests. Something like --test-timeout-diagnostics which would display a sorted list of unit tests by how long they took to run.

When this flag is run at the build system level, it would aggregate the data across all sets of unit tests.

This is just a fun side benefit, but --test-timeout-diagnostics is what suggestion would be given to folks asking "how do I disable the cache when running tests?" which would encourage, let's be honest, people who are probably doing naughty things in their unit tests, to at least think about how long they are taking to run.

Rexicon226 commented 4 months ago

I think this would require us to have async for something like cancelawait or spawn a manager thread. With status quo, how do you plan to time out the function in a single-threaded environment?

rohlem commented 4 months ago

@Rexicon226 I believe the build system already spawns subprocesses for testing in parallel - a spawned test process could communicate the timeout requirement back to the parent build (/ test server) process, which could then enforce it. (EDIT: actually stated in OP that the build system is responsible for this) Alternatively, the timing requirement could also be checked at any point during the test (f.e. when calling std.testing functions) and trigger an error (or even direct process exit) if elapsed.

That said, even if we didn't cancel the test directly and instead only fail after letting it complete (EDIT: turns out also stated in OP), I think this system could still empirically be used to both catch regressions and establish baseline timings of tests to optimize upon. Therefore it still helps towards the stated goal of helping keep unit test times below an acceptable threshold.

Rexicon226 commented 4 months ago

a spawned test process could communicate the timeout requirement back to the parent build (/ test server) process,

This is what I was referring to. This test process runs on a single thread (usually), and has no way of interrupting the unit tests while they are running (or hanging). I suggested a potential solution could be cancelawait inside of the test runner, but I'm curious what other people have in mind.

We could set a time limit for the entire test runner process but this is a bad idea because we will loose information about which test it was that timed out.

rohlem commented 4 months ago

a spawned test process could communicate the timeout requirement back to the parent build (/ test server) process,

This is what I was referring to. This test process runs on a single thread (usually), and has no way of interrupting the unit tests while they are running (or hanging).

AFAIU that would be the responsibility of the parent zig build process - it can simply kill the respective child zig test process. (As stated, zig test spawned directly may instead complete the test before reporting it as a failure.) Do you see an issue with this? I imagine it already does the same thing on receiving a SIGKILL or terminal interrupt, so I don't think we would be introducing a new mechanism here.

Rexicon226 commented 4 months ago

The issue that I see is that we would not know which test was the one hanging (timing out).

rohlem commented 4 months ago

The issue that I see is that we would not know which test was the one hanging (timing out).

From how I interpret lib/compiler/test_runner.zig (looking at it for the first time), every single test execution is requested by the parent process. So the build process already knows which test it requests a particular child process to run, and that will be the hanging test if it times out. We already have to add a new message type for communicating the expected time limit, plus I assume we can freely modify the test-server communication protocol, since it's an implementation detail internal to the Zig toolchain.

Rexicon226 commented 4 months ago

That's an interesting idea, we could have the "latest" test being ran and we'll know if it times out. In the future I would like have this ability in the test runner itself, zig test, but I can't think of a way to do this without async. With the test-server, I agree that the build runner just killing the test process is the best idea.

To add, I believe we'll need a message type for the amount of time the test spent as well, to have this --test-timeout-diagnostics flag. My only concern there is that users might mistake this for benchmarks, but that is a small concern.

A real world case I personally have is in the RISC-V backend, it's not uncommon for tests to go into an infinite loop because of incorrect codegen, and I have to manually set a timeout in my wrapper script.

ziglang / zig

unit tests: enforce time limits #19821