Closed iaindillingham closed 2 years ago
With respect to this comment, I don't think you necessarily need Hypothesis to generate more complex data.
Here's the current test:
def test_series_with_interval_index_to_histogram():
obs_hist, obs_bin_edges = processing._series_with_interval_index_to_histogram(
pd.Series(
[1],
pd.IntervalIndex.from_tuples([(0.5, 1.5)]),
dtype=int,
name="bmi",
)
)
exp_hist = np.array([1])
exp_bin_edges = np.array([0.5, 1.5], dtype=float)
assert np.array_equal(obs_hist, exp_hist)
assert np.array_equal(obs_bin_edges, exp_bin_edges)
All I'm suggesting is that you replace the test data with something like:
pd.Series(
[1, 2, 3],
pd.IntervalIndex.from_tuples([(0.5, 1.5), (1.5, 2.5), (2.5, 3.5)]),
dtype=int,
name="bmi",
)
And then assert that:
exp_bin_edges = np.array([0.5, 1.5, 2.5, 3.5], dtype=float)
assert np.array_equal(obs_bin_edges, exp_bin_edges)
The current test would pass with an implementation that did something like:
def _series_with_interval_index_to_histogram(series):
hist = series.values
bin_edges = [series.index.left.values[0], series.index.right.values[-1]]
return hist, bin_edges
From reading the current implementation, I'm as sure as I can be that it's correct! But I think it's worth getting in the habit of creating test data that not the-simplest-possible-thing, for times when the implementation is less obviously correct. That way, you reduce your risk of having tests passing because they don't exercise the full range of behaviour of the function under test.
There's an analogy here with how 100% branch coverage doesn't actually mean all the code is being tested, because there are often multiple logic paths which run through the same code path. I'll try to think of some examples.
46 introduces some new tests, but they aren't very complex (indeed, they were written to be extremely simple). However, it's likely that they're ineffective. We should consider using property based testing, with support from Hypothesis.