rudymatela / leancheck

enumerative property-based testing for Haskell
https://hackage.haskell.org/package/leancheck
Other
52 stars 8 forks source link

Tutorial: Insufficient properties for correct sort implementation #9

Closed csternagel closed 5 years ago

csternagel commented 5 years ago

Testing whether a sorting function sort is correct involves two parts:

  1. Test that the result of sort is actually sorted, and
  2. check that all values occur exactly the same number of times in the input and the output of sort.

Part (1) is achieved by your property prop_ordered, but for part (2) it is not enough to just check for equal length and that each element occurs in the input if and only if it occurs in the output.

Otherwise a function that would behave like sort [1,3,2,1] = [1,2,2,3] would be considered a correct sorting function.

Therefore I suggest to instead use a property that is based on a function count :: Eq a => a -> [a] that counts how often an element occurs in a given list:

prop_count :: Eq a => a -> [a] -> Bool
prop_count x xs = count x (sort xs) == count x xs
rudymatela commented 5 years ago

@csternagel Thanks for the suggestion.

Otherwise a function that would behave like sort [1,3,2,1] = [1,2,2,3] would be considered a correct sorting function.

I know ;-) I think the FitSpec tool may interest you: it can automatically find alternative implementations of functions that still pass given tests. The example in its README deals exactly with what you are describing, reporting sort [0,0,1] = [0,1,1] as part of an alternative implementation of sort that still passes the trio of properties in LeanCheck's tutorial. The research paper about FitSpec also deals with the exact same example.

Therefore I suggest to instead use a property that is based on a function count :: Eq a => a -> [a] that counts how often an element occurs in a given list

I have actually used count all along in the eg/test-sort.hs example albeit I didn't use it in the tutorial. The intent of the tutorial is more to teach property-based testing, so I think sticking with prop_length is better there: it is simpler to understand and it only takes an argument. I think it makes the tutorial more beginner friendly.

Nevertheless, I think mentioning prop_count there is a good idea. Based on your suggestion, I've added a small paragraph to the tutorial challenging the reader to come up with an implementation to prop_count, see: 9474cd7036386da07efb204995d9f24950f7e93d