Open langfield opened 3 months ago
To motivate this a little, very often we have data invariants associated with some type, and we only want to generate valid inhabitants. With QuickCheck, we can do e.g.
data BST a = Empty | Node (BST a) a (BST a) deriving (Eq, Show)
sizedBST :: (Arbitrary a, Ord a, Random a, Enum a) =>
a -> a -> Int -> Gen (BST a)
sizedBST lo hi 0 = return Empty
sizedBST lo hi n =
oneof [do
v <- choose (lo,hi)
l <- sizedBST lo (pred v) (n `div` 2)
r <- sizedBST (succ v) hi (n `div` 2)
return $ Node l v r]
which generates a valid binary search tree with values within certain bounds. (See here for the notes from which this was taken.)
I cannot figure out a good way to do this with LeanCheck. π
I've been using LeanCheck for a few months now and it's wonderful! :)
@langfield, I am glad you appreciate LeanCheck. :grin:
To motivate this a little, very often we have data invariants associated with some type, and we only want to generate valid inhabitants.
When you have data invariants, I find it best to delegate dealing with those to the smart constructors for the data type. Expanding your BST example a bit:
data BST a = Empty
| Node (BST a) a (BST a)
deriving (Eq, Show)
-- inserts an element into a BST
insert :: Ord a => a -> BST a -> BST a
insert x Empty = Node Empty x Empty
insert x (Node l y r)
| x < y = Node (insert x l) y r
| x > y = Node l y (insert x r)
| otherwise = Node l y r -- no-op for repeated elements
-- converts a list of elements into a BST
fromList :: Ord a => [a] -> BST a
fromList = foldr insert Empty
With insert
, we can write the following Listable
instance for our BST
:
-- variation 1
instance (Listable a, Ord a) => Listable (BST a) where
tiers = cons0 Empty \/ cons2 insert
This one respects the data invariant, all generated BSTs are valid.
With fromList
, we can have an even shorter instance:
-- variation 2
instance (Listable a, Ord a) => Listable (BST a) where
tiers = cons1 fromList
These two instances are equivalent and will yield the same list (and tiers) of BSTs. One disadvantage is that there are repetitions:
> take 12 list :: [BST Int]
[ Empty
, Node Empty 0 Empty
, Node Empty 0 Empty
, Node Empty 1 Empty
, Node Empty 0 Empty
, Node (Node Empty 0 Empty) 1 Empty
, Node Empty 0 (Node Empty 1 Empty)
, Node Empty (-1) Empty
, Node Empty 0 Empty
, Node (Node Empty 0 Empty) 1 Empty
, Node Empty 0 (Node Empty 1 Empty)
, Node Empty (-1) (Node Empty 0 Empty)
]
These repetitions yield from the fact that some different inputs to fromList
yield the same BST: e.g. [0,2,1] and [2,0,1]:
> fromList [0,2,1]
Node (Node Empty 0 Empty) 1 (Node Empty 2 Empty)
> fromList [2,0,1]
Node (Node Empty 0 Empty) 1 (Node Empty 2 Empty)
If you want to avoid these repetitions, you can use the setCons
constructor to guarantee that fromList
will be applied to set-unique lists.
-- varation 3
instance (Listable a, Ord a) => Listable (BST a) where
tiers = setCons fromList
This instance respect the data invariants and has no repetitions:
> take 12 list :: [BST Int]
[ Empty
, Node Empty 0 Empty
, Node Empty 1 Empty
, Node (Node Empty 0 Empty) 1 Empty
, Node Empty (-1) Empty
, Node Empty (-1) (Node Empty 0 Empty)
, Node Empty 2 Empty
, Node (Node Empty 0 Empty) 2 Empty
, Node Empty (-1) (Node Empty 1 Empty)
, Node Empty (-2) Empty
, Node Empty (-1) (Node (Node Empty 0 Empty) 1 Empty)
, Node Empty (-2) (Node Empty 0 Empty)
]
This enumeration is quite restrictive, you only get a single possible arrangement for a BST given a set of values. Depending on your application, this can be good or bad. When in doubt, I would say this is the way to go. This is what I use in leancheck-instances package to generate values of the Set datatype.
If you are interested in generating all possible ways that a BST can be arranged, you can do the following:
import Test.LeanCheck
import Data.List (inits, tails, sort)
-- generates all possibilities of arranging a BST
-- from a given set of elements
possibilities :: Ord a => [a] -> [BST a]
possibilities = ps . sort
where
ps [] = [ Empty ]
ps xs = [ Node l x r
| (ys,x:zs) <- zip (inits xs) (tails xs)
, l <- ps ys
, r <- ps zs
]
-- variation 4
instance (Listable a, Ord a) => Listable (BST a) where
tiers = map concat
. mapT possibilities
$ setsOf tiers
This generates unique trees in respect to structural equality:
> take 12 list :: [BST Word]
[ Empty
, Node Empty 0 Empty
, Node Empty 1 Empty
, Node Empty 0 (Node Empty 1 Empty)
, Node (Node Empty 0 Empty) 1 Empty
, Node Empty (-1) Empty
, Node Empty (-1) (Node Empty 0 Empty)
, Node (Node Empty (-1) Empty) 0 Empty
, Node Empty 2 Empty
, Node Empty 0 (Node Empty 2 Empty)
, Node (Node Empty 0 Empty) 2 Empty
, Node Empty (-1) (Node Empty 1 Empty)
]
... there are repetitions wrt. the set of elements.
Variation 4 uses setsOf
and mapT
which are also built-in'to LeanCheck. The weird map concat
at the end is so the nested tiers of lists of BSTs ([[ [BST a] ]]
) is flattened back into "regular" tiers of BSTs ([[BST a]]
).
Depending on what you want, you should use something like variation 3 or variation 4. In doubt, I would go with 3 as it is simpler.
Hopefully this helps. :smile:
At some point in the next weeks, I think I should perhaps:
eg/
folderbagCons
doc to make it consistent with its sister functions.This enumeration is quite restrictive, you only get a single possible arrangement for a BST given a set of values.
Yes I came up with variation 3 myself, but opted to use QuickCheck instead because of this limitation. However, your variation 4 looks like exactly what I'm looking for!
Thanks so much for writing out such a detailed response. Is the stuff in doc/
and eg/
accessible from the hackage docs? I'll have to read them over.
Feel free to close this issue as you see fit. ππ»
Is the stuff in doc/ and eg/ accessible from the Hackage docs?
Sort of:
... but the rendering is not nice. You get the raw markdown and sources without syntax highlighting. These are better read here on GitHub:
... I kind of forgot to answer your original question though...
Does there exist an analogue of QuickCheck's
choose
function?
Not really. For now I don't think there's anything analogous to choose
.
The (my :sweat_smile:) intended way to generate specialized enumerations for LeanCheck is to:
setCons
, bagCons
, mapCons
, setsOf
, etc...; orYou can get good enumerations with 1 and 2 in most cases. Sometimes requiring less thinking than in QC, sometimes require more thinking than in QC.
I don't think creating something analogous to choose
is impossible in LeanCheck, but for it I would have to implement a monadic interface for LC first (#16). This one is quite far down in my TODO list though... :sweat_smile:
... I kind of forgot to answer your original question though...
Does there exist an analogue of QuickCheck's
choose
function?Not really. For now I don't think there's anything analogous to
choose
.The (my π ) intended way to generate specialized enumerations for LeanCheck is to:
1. use smart constructors paired with `setCons`, `bagCons`, `mapCons`, `setsOf`, etc...; or 2. when 1 does not work, manipulate the enumeration directly: in LeanCheck enumeration is constructed as a simple list of lists of values: more specifically a possibly infinite list of finite lists.
You can get good enumerations with 1 and 2 in most cases. Sometimes requiring less thinking than in QC, sometimes require more thinking than in QC.
I don't think creating something analogous to
choose
is impossible in LeanCheck, but for it I would have to implement a monadic interface for LC first (#16). This one is quite far down in my TODO list though... π
Yes it seems superfluous, I hadn't realized it was quite so easy to manipulate the enumeration directly. I haven't written any Listable
instances that weren't straightforward boilerplate uses of consN
.
By the way, it has occurred to me that LeanCheck must "miss" values because of Cantor's theorem. For example if you were enumerating Set Int
, you are essentially attempting to generate $\mathcal{P}(\mathbb{N})$, no? I searched for "diagonalization", "Cantor", and "power set" in your thesis and didn't see anything addressing this issue. Is it not a problem in practice?
(Please correct if I have made an incorrect assumption.)
Is the stuff in doc/ and eg/ accessible from the Hackage docs?
Sort of:
* https://hackage.haskell.org/package/leancheck/src/doc/ * https://hackage.haskell.org/package/leancheck/src/eg/
... but the rendering is not nice. You get the raw markdown and sources without syntax highlighting. These are better read here on GitHub:
* https://github.com/rudymatela/leancheck/tree/master/eg * https://github.com/rudymatela/leancheck/tree/master/doc
Ah yeah I should have said "easily discoverable", haha. Although a more prudent person than myself would have read through directories with names like those!
By the way, it has occurred to me that LeanCheck must "miss" values because of Cantor's theorem.
Yes and no. It misses values fome some types, but not for Set Int
actually...
For example if you were enumerating
Set Int
, you are essentially attempting to generate Powerset(N) , no?
Int
is a finite set. So the powerset of Int
s is a finite set too! This makes it countable/enumerable.
Of course, you probably mean Set Integer
. However the Haskell Set
type can ony hold finite sets. We can represent arbitrary finite sets such as Set.fromList [2,4,6]
, or Set.fromList [2,3,5,7]
but we can never represent infinite sets such as "the set of even numbers"; "the set of primes"; or even "the full set of Integers". This is enough to make Cantor's argument not to apply in this case. (Set
is even capped at maxBound::Int
elements, however, just restricting instances to be finite is enough for Cantor's argument not to apply.)
The enumerations of Set Int
and Set Integer
are complete: provided you have infinite time (and memory) you will reach any finite set you can think of. I think even Set (Set Integer)
is complete as well.
The Haskell Set
is to the maths set what Double
is to the real number set.
So what values does LeanCheck misses, and for which types?
LeanCheck is able to completely enumerate finite lists of integers: [Integer]
. There are no holes in this enumeration when you consider only finite lists. However, with the list type, we can indeed represent infinite lists, such as the list of even positives [2,4..]
or the list of odd positives [1,3..]
. Infinite lists are not included in the enumeration. These are exactly the values that LeanCheck misses wrt. the diagonal argument.
LeanCheck is primarily a testing tool, so not having infinite lists being generated in the standard/default enumeration is a good thing. We want our strict properties to terminate:
prop_sort_idempotent :: [Int] -> Bool
porp_sort_idempotent xs = sort (sort xs) == sort xs
There are (lazy) properties that can/should require infinite lists, in these cases one could generate a class of infinite lists (such as periodic lists) using a wrapper newtype
:
prop_requiring_infty :: Periodic [Int] -> Bool
prop_requiring_infty xs = ...
The Periodic
newtype constructor is not available by default in LeanCheck, my point here is one could make one.
Another LeanCheck enumeration that misses a lot of values is the one for functions (not enabled by default but available in Test.LeanCheck.Function
).
Summary of enumeration completeness:
Int
: complete enumeration -- finite thus countable setInteger
: complete enumeration -- infinite but countable setDouble
: "complete" enumeration -- finite thus countable set
-- "complete" is in quotes here because we intentionally omit NaN
and -0
.Rational
: complete enumeration -- infinite but countable set[Integer]
: complete enumeration of finite list values (countable set)
--- but no enumeration of infinite list values (impossible/uncountable/not too useful)For recursive data types, such as lists and BSTs, enumerations are generally complete wrt. finite values. Infinite values are completely omitted.
Another example I thought of: if you are generating AVL trees, it is possible to generate all of them as the data invariant requires them to have a finite height.
... good question btw. I probably should include a discussion of this in LeanCheck's documentation at some point. For now at least it is here on the issue tracker. :smile:
I've been using LeanCheck for a few months now and it's wonderful! :) However I often find myself wanting to generate an example of some kind of data structure and then sample an element from it. With LeanCheck this is pretty cumbersome, I often just generate an extra element and then add it to the data structure, but that isn't very nice because it ends up "near the boundary".
Is there something like QuickCheck's
choose
that will let me do this?