opensafely-actions / cohort-report

Cohort-report generates a report for each variable in an input file
MIT License
0 stars 0 forks source link

Property based testing #51

Closed iaindillingham closed 2 years ago

iaindillingham commented 3 years ago

46 introduces some new tests, but they aren't very complex (indeed, they were written to be extremely simple). However, it's likely that they're ineffective. We should consider using property based testing, with support from Hypothesis.

inglesp commented 3 years ago

With respect to this comment, I don't think you necessarily need Hypothesis to generate more complex data.

Here's the current test:

def test_series_with_interval_index_to_histogram():
    obs_hist, obs_bin_edges = processing._series_with_interval_index_to_histogram(
        pd.Series(
            [1],
            pd.IntervalIndex.from_tuples([(0.5, 1.5)]),
            dtype=int,
            name="bmi",
        )
    )

    exp_hist = np.array([1])
    exp_bin_edges = np.array([0.5, 1.5], dtype=float)
    assert np.array_equal(obs_hist, exp_hist)
    assert np.array_equal(obs_bin_edges, exp_bin_edges)

All I'm suggesting is that you replace the test data with something like:

        pd.Series(
            [1, 2, 3],
            pd.IntervalIndex.from_tuples([(0.5, 1.5), (1.5, 2.5), (2.5, 3.5)]),
            dtype=int,
            name="bmi",
        )

And then assert that:

    exp_bin_edges = np.array([0.5, 1.5, 2.5, 3.5], dtype=float)
    assert np.array_equal(obs_bin_edges, exp_bin_edges)

The current test would pass with an implementation that did something like:

def _series_with_interval_index_to_histogram(series):
    hist = series.values
    bin_edges = [series.index.left.values[0], series.index.right.values[-1]]
    return hist, bin_edges

From reading the current implementation, I'm as sure as I can be that it's correct! But I think it's worth getting in the habit of creating test data that not the-simplest-possible-thing, for times when the implementation is less obviously correct. That way, you reduce your risk of having tests passing because they don't exercise the full range of behaviour of the function under test.

There's an analogy here with how 100% branch coverage doesn't actually mean all the code is being tested, because there are often multiple logic paths which run through the same code path. I'll try to think of some examples.