Why Property Testing Finds Bugs Unit Testing Does Not

tieubao commented 3 years ago

For those of you just joining us, Property-Based Testing is when, instead of giving specific input-outputs in your test, you describe the general properties the execution should have. Then you try it on randomized values. A simple example is testing that addition is commutative:

def add(a, b):
    return a + b

@given(integers(), integers())
def test_add_is_commutative(a, b):
    assert add(a,b) == add(b,a)

(I’m using pytest and hypothesis for everything.)

You can read a deeper treatment of PBT here. PBT is powerful because it lets you test a wider range of inputs, but you have to learn how to come up with properties and how to do complex strategies (input generators), which are both skills.

Anyway, Brian’s argument in the tweets is roughly this:

The majority of errors you find with testing are either issues with an entire “partition” of inputs or “boundary” inputs, like INT_MIN.
For partition errors, you can pick any random issue and find it, in which case you might as well just write a unit test for that input.
For boundary errors, those arise from the specification of the problem, and to handle those, it’s necessary to think hard about the problem and what the unexpected cases will be, regardless of how you plan to test.
In which case PBT doesn’t provide sufficient benefit over manual unit tests.

I don’t think that’s exactly his line of reasoning, but I think it’s close enough that my response to this version should be compatible with the response to his intended argument. The value PBT has over manual unit testing is in the combinatorial explosion of boundary conditions and edge cases in even a moderately-complex problem.

The Geometry of Tests

Consider a function with one integer input:

def f(x: int):
  ...

We can imagine the function as a 2D graph, like the kind you see in high school. That means you can see a “spatial” element to the inputs: it is a one-dimensional line, describing the entire input space. If we were writing all tests manually, we’d try a few “ok” inputs and then some pathological cases:

0
1
negative numbers
very large numbers
INT_MIN and INT_MAX, if you have those

That’s in the realm of feasibility for manual testing. If there are any edge cases in the specification of the problem, you’d probably see them.

What if we have two integer inputs?

def g(x: int, y: int):
  ...

Now the input space is two-dimensional. This immediately squares our pathological cases: not only do we have to test the edge cases for both x and y, we have specifically test all combinations where both of them are pathological! But beyond that, we also have new edge cases that stem from the geometry of the space:

x = y
x = -y
x < y
x > y
x >>> y, x <<< y
x divides/is divisible by y
x and y are coprime

Again, with some cleverness and studying of the spec, you can probably identify what the actual problematic edges are and test them manually. Also, you can write tests that hit multiple of these at once, like testing (1, 1). But even so, you can see that the extra structure of having two variables leads to more edge cases. A 2D plane has a lot more complexity than 1D line. And a 3-space has more complexity than a 2D plane- we’d expect another set of edge cases in the interactions of three variables together.

I make the “input space” sound like a geometric thing, but it’s a poor analogy. There’s no geoemetric analog for a string input, or a list of numbers1, or a nested dict. This weakens our ability to test, since we can’t hypothesis problematic edges by thinking about the geometry. More edge cases become unintuitive, and it’s more likely that someone could miss them. Randomized testing becomes increasingly useful as it becomes harder for us to think through all the edges of a system.

(A common “intuition” for this is that for an 10-dimensional hypercube, “99.75% of the volume is in the corners”. But I’m already including “interesting substructure of the geometry” as examples of edge cases, while that post builds the intuition off edge cases being values “far from the center”. And I also already said that the geometric analogy breaks down. The link is a good post but reading both our articles back-to-back is prolly a bad idea.)

HongQuanTo commented 3 years ago

Nice capture and note. :thinking: Apply tool chain from (input-analysis -> PTB) + (spec-clever -> selected-properties) -> decision-table = very huge test matrix for all imaginable edges. This can be achieve with some sort of test case auto generation tools. Then, looking at the matrix (instead of a shape or a map) to select which to test, which to hope (for not happened) is our choice base on the project context.

tieubao commented 3 years ago

@HongQuanTo agree

For us we use the input space metaphor not only to find the boundary but also to guide the engineers to the broaden approach, later then they can apply it to formal verification, which is an ultimate goal of our software quality

tieubao / til

Why Property Testing Finds Bugs Unit Testing Does Not #541

The Geometry of Tests