Open zetanumbers opened 1 month ago
I don't know much about the Helgrind, so I can't say well about the comparison.
As for loom, it uses the DPOR algorithm, where it switches threads or not whenever the program interacts with a loom primitive, exploring all combinations of switches to verify the program's correctness.
Here are some materials that explain the internal in more detail.
They're quite different. The way loom works is that it's able to precisely control how different threads get interleaved with each other in multi-threaded code, and then it runs the test millions and millions of times to try every possible way of interleaving the threads. If there's any interleaving that crashes or deadlocks, it reports a test failure.
On the other hand, Helgrind runs the test once and tries to detect errors in the program based on that single execution.
As is hopefully clear, Helgrind will be much faster. Sure, it pays some extra cost per instruction to run valgrind, but it only runs the test once and not millions of times. Running Tokio's loom tests can take 30 minutes to an hour, and that's only because we split the loom tests over ~10 different CI jobs that run in parallel.
However, loom will be much more precise. Helgrind has some heuristics to catch common mistakes. Loom tries every possible execution out there.
I think that in order to understand the difference between Loom and runtime checkers like Helgrind or TSAN, is worth noting that Loom is a model checker that simulates the behaviors permitted by the abstract C++ memory model, while Helgrind is trying to detect observed errors by executing the actual compiled program. This means that, while Loom is substantially more exhaustive due to its ability to try every possible execution path permitted by the memory model, it won't catch miscompilations where a compiler bug causes it to produce a binary that behaves in a way not permitted by the memory model. Similarly, if Loom's implementation of the behavior permitted by the model is incorrect, a Loom bug may result in a failure to detect certain bugs in the code under test due to not exploring a potential execution path that could occur in Real Life.
On the other hand, sometimes concurrency bugs exist only on particular platforms. For instance, some potential interleavings of atomic operations permitted by the C++ memory model will not actually occur in real life on x86 systems, because x86 CPUs have strong memory ordering semantics for all operations. On the other hand, other CPU architectures, such as ARM, lack those ordering guarantees except for particular instructions or access patterns. This means that you can write code which has a data race permitted by the memory model which will never occur on x86 systems, but could occur on ARM systems. Loom will detect such an issue regardless of what platform you're actually running your tests on, while a tool like Helgrind will require you to actually run your tests on a given CPU architecture in order to catch issues that occur on that architecture.
Another note is that, because Loom is a model checker, diagnosing a bug using Loom requires writing a Loom model that reproduces the particular scenario that causes a bug to occur, which means you kind of already have to know what's gone wrong. On the other hand, I believe Helgrind is a tool that can be used to diagnose an observed problem by running the software with the inputs that reproduce that problem, and then report where the program was executing when the data race or deadlock occurred.
So, in conclusion, they are different tools that attempt to catch the same categories of bug using very different approaches. I think they're pretty complimentary. I would use Loom for writing models that test concurrent code during the development, and for writing regression tests when fixing a bug. On the other hand, I might reach for Helgrind (or similar tools like thread-sanitizer) when I'm trying to nail down an issue which has occurred at runtime, but I would not rely on such tools to deterministically and exhaustively explore all possible executions of a concurrent program, since...they don't do that.
I've been proposed to test some hardly reproducible deadlocks in rust code using the Valgrind. Assuming deadlock detection in the Valgrind is done by Helgrind, would Loom or Helgrind do more exhaustive check, and which would more likely complete faster? What are the differences between these tools' algorithms for deadlock detection?