ziglibs / diffz

Implementation of go-diff's diffmatchpatch in Zig
MIT License
20 stars 6 forks source link

Allocator-independent testing #24

Open mnemnion opened 2 months ago

mnemnion commented 2 months ago

The merge conversation in #23 has demonstrated that switching to bring-your-own-allocator memory management will require some changes in testing strategy.

If I'm understanding some context correctly, ZLS is using DiffMatchPatch, and surely doing so with an arena. So for benchmarking and consistency reasons, it would be good to be able to run the tests using an arena as well.

Furthermore, it would be good to assure that an OutOfMemory incident doesn't leak memory, or improperly double-free it. std.testing has FailingAllocator for that kind of check, so that's a minimum of three allocators which would be well to use on the test suite.

Last, it would be good to set up the tests so that they're also benchmarks, using std.Timer to collect data. That could be conditionally reported from a separate build step, and is harmless to run when the information it provides isn't necessary. This could include running the tests many more times, in order to get useful amounts of timing data.

My sketch of a solution here is pretty simple: change each of the test blocks into a pair: a function, which takes an allocator and performs the tests, and a test block, which calls that function with an allocator. The functions should initialize a Timer and return its results, that's probably the right level of granularity but we should discuss that.

How things are structured from there is less clear to me. I haven't used a failing allocator before, but it seems pretty simple: run a for (0..) |allowed_allocations| loop, which initializes a FailingAllocator to permit that many allocations, and break when we no longer catch OutOfMemory errors.

Whether the tests should be run on both the std.testing.allocator, and an arena, every time, is less clear to me. Currently, on an M1, a test run is absolutely dominated by build time, finishing in a few hundred milliseconds when has_side_effects = true is used to allow the tests to rerun without any build changes. So some changes which bump the test running time up to a second or so wouldn't really move the needle in the usual workflow where tests are run after a build.

But it isn't obvious to me that double testing with std.testing.allocator, and an arena, is an important thing to do every time. Another option is to add the arena as a build flag, which would comptime switch from std.testing.allocator to ArenaAllocator. I also don't know what happens when you make a FailingAllocator backed by an ArenaAllocator, but the design seems pretty composable.

So that's my sketch of a plan here, let me know what you think.

Techatrix commented 2 months ago

I can't think of a reason why testing with an arena allocator could be beneficial. If the code works with the testing allocator then it will work with an arena allocator but maybe no the other way round. So we need the testing allocator but not the arena allocator. Running benchmarks with an arena allocator on the other hand would make sense.

On the topic of benchmarking, the current unit tests don't really represent real workloads that this should be measured. I am wondering how https://github.com/google/diff-match-patch is measuring the performance of their implementations :thinking: Maybe something similar to what I did in https://github.com/ziglibs/diffz/pull/23#issuecomment-2212419733 could make sense?

mnemnion commented 2 months ago

I can't think of a reason why testing with an arena allocator could be beneficial.

Running benchmarks with an arena allocator on the other hand would make sense.

These both make sense to me, fwiw. Setting up the tests to take your choice of allocator is necessary to check OutOfMemory conditions, and that does make them reusable for arenas in a benchmarking context, with the right wiring.

If that ends up being the approach. "The same tests, just run a lot" is a low-effort way to get some performance data, but the current test suite isn't very demanding on the algorithms to begin with. I set up a test coverage step which uses kcov, and line coverage isn't bad, 94% I believe. But diffCleanupEfficiency isn't tested anywhere, which means it isn't even being compiled. On the other hand, there are only a small handful of missed lines on the functions which do have tests, so that's well done.

But benchmarking is more of an integration test, in terms of the size and variety of test data that it calls for. Probably this should be its own file and build stage.