Closed barrucadu closed 6 years ago
In this case, the counter-example is just too big for LeanCheck.
Here is a mock property that simulates your bug:
prop :: DepState -> TVarId -> Id -> ThreadAction -> ThreadAction -> Bool
prop ds
(TVarId (Id Nothing 0))
(Id Nothing 0)
(BlockedSTM [TWrite (TVarId (Id Nothing 0))])
(STM [TCatch [TRead (TVarId (Id Nothing 0))] Nothing] [])
| ds == DepState (M.fromList []) (Set.fromList []) (M.fromList [])
= False
prop _ _ _ _ _ = True
Even with 5 000 000 tests, LeanCheck does not find the bug:
> checkFor 5000000 prop
+++ OK, passed 5000000 tests.
If we simplify the property to take only the ThreadAction
arguments
prop45 :: ThreadAction -> ThreadAction -> Bool
prop45 (BlockedSTM [TWrite (TVarId (Id Nothing 0))])
(STM [TCatch [TRead (TVarId (Id Nothing 0))] Nothing] []) = False
prop45 _ _ = True
LeanCheck reports:
> checkFor 5000000 prop45
*** Failed! Falsifiable (after 3877262 tests):
(BlockedSTM [TWrite 0]) (STM [TCatch [TRead 0] Nothing] [])
It needed 3 877 262 tests. The tuple (BlockedSTM [TWrite 0], STM [TCatch [TRead 0] Nothing] [])
appears as the 3877262th element in the (ThreadAction, ThreadAction)
Listable
enumeration. It has size 3+5=8. I conjecture that for the 5-argument property, LeanCheck will take between 10 to 100 million tests to find the bug (you'll probably run out of memory before then).
> take 9 $ map length (tiers :: [[(ThreadAction, ThreadAction)]])
[121,748,2168,4998,13022,44300,193051,976794,5354127]
The enumeration of ThreadAction
s grows too fast because the datatype is too wide. The blow-up is kind of inevitable and not the fault of your Listable
instances. For the type of 5-tuples of the full property arguments, the tier of size 8 has 60 644 874 elements!
> take 9 $ map length $ (tiers :: [[(DepState, TVarId, Id, ThreadAction, ThreadAction)]])
[121,1474,9923,51149,229135,949572,3786689,14990753,60644874]
Not to the point of finding the bug. (Unless you cheat by omitting less relevant constructors on the definition of Listable ThreadAction
.)
You can rewrite Listable DepState
without using ><
and nested tuples:
instance Listable SCT.DepState where
tiers = mapT (\(a,b,c) -> SCT.DepState a b c) tiers
This is just aesthetic, the result is the same.
Instances for ThreadId
and others could have been written as:
instance Listable D.ThreadId where
tiers = reset $ cons1 D.ThreadId
Again, just aesthetic, the result is the same. cons1
adds 1 to size and reset
removes it.
Your definition of Listable Set
has repetitions, as you are constructing sets from lists. [0,1]
and [1,0]
will yield the same set. LeanCheck provides setCons
and setsOf
which are useful when building sets. An improved listable instance without repetitions is given below:
instance (Ord v, Listable v) => Listable (Set v) where
tiers = setCons Set.fromList
-- or alternatively: tiers = mapT Set.fromList $ setsOf tiers
It does make quite a significant difference to the number of DepState
s in each tier:
> take 12 $ map length $ (tiers :: [[DepState]]) -- with repetitions
[1,4,16,59,206,695,2287,7387,23524,74090,231323,717212]
> take 12 $ map length $ (tiers :: [[DepState]]) -- without repetitions
[1,4,15,54,183,604,1952,6212,19543,60940,188705,581103]
A similar thing could be done to improve the Map
instance. However, LeanCheck does not provide something readily available to build Map
s. This is something I may add in the near future.
I was thinking about this. Here is some extra information.
--- Quick-and-dirty function to check the size of a Listable value
size :: (Listable a, Eq a) => a -> Int
size x = fromJust $ L.findIndex (x `elem`) tiers
.
> size $ DepState (M.fromList []) (Set.fromList []) (M.fromList [])
0
> size $ TVarId (Id Nothing 0)
0
> size $ Id Nothing 0
0
> size $ BlockedSTM [TWrite (TVarId (Id Nothing 0))]
3
> size $ STM [TCatch [TRead (TVarId (Id Nothing 0))] Nothing] []
5
So, the total size of the counterexample is 8.
Given that
> take 9 $ map length $ (tiers :: [[(DepState,TVarId,Id,ThreadAction,ThreadAction)]])
[121,1474,9923,51149,229135,949572,3786689,14990753,60644874]
The counterexample will appear between 20 018 817 and 80 663 690 tests.
If we use the Set enumeration without repetitions:
> take 9 $ map length $ (tiers :: [[(DepState,TVarId,Id,ThreadAction,ThreadAction)]])
[121,1474,9802,49554,217375,883594,3470014,13608974,54942235]
The counterexample will appear between 18 240 909 and 73 183 143. Not a big difference. By defining Listable Map
without repetitions, this number should go down, but I don't think it will be enough to find the counterexample without running out of memory.
A slightly smaller counterexample is if the last parameter is STM [TRead (TVarId (Id Nothing 0))] []
, the TCatch
isn't necessary. What's important is that it's two transactions which overlap in TVar
s, where the first blocks.
I'm a little surprised Hedgehog didn't find this smaller one, but maybe the default number of shrinking steps allowed is too low.
Hey @barrucadu, LeanCheck actually finds a few counterexamples to DejaFu's property when configured to test up to 2 500 000 test cases:
$ git clone git@github.com:barrucadu/dejafu.git
$ cd dejafu
$ git checkout e8634810765c182af96cf85dd2f92c6c34787994
$ cd dejafu-tests
$ cabal sandbox add-source ../dejafu
$ cabal sandbox add-source ../hunit-dejafu
$ cabal sandbox add-source ../tasty-dejafu
$ cabal install --only-dependencies
$ vi Common.hs # then change 2500 to 2500000
$ cabal build
$ ./dist/build/dejafu-tests/dejafu-tests --select-tests="dependent ==> dependent"
Test Cases:
Properties:
SCT:
:dependent ==> dependent': [Failed]
Failed for (DepState {depCRState = fromList [], depMVState = fromList [], depMaskState = fromList []}) 0 0 (BlockedSTM [TRead 0]) (STM [TWrite 0] [])
Failed for (DepState {depCRState = fromList [], depMVState = fromList [], depMaskState = fromList []}) 0 0 (BlockedSTM [TRead 0]) (BlockedSTM [TWrite 0])
Failed for (DepState {depCRState = fromList [], depMVState = fromList [], depMaskState = fromList []}) 0 0 (BlockedSTM [TWrite 0]) (STM [TRead 0] [])
Failed for (DepState {depCRState = fromList [], depMVState = fromList [], depMaskState = fromList []}) 0 0 (BlockedSTM [TWrite 0]) (STM [TWrite 0] [])
Failed for (DepState {depCRState = fromList [], depMVState = fromList [], depMaskState = fromList []}) 0 0 (BlockedSTM [TWrite 0]) (BlockedSTM [TRead 0])
Failed for (DepState {depCRState = fromList [], depMVState = fromList [], depMaskState = fromList []}) 0 0 (BlockedSTM [TWrite 0]) (BlockedSTM [TWrite 0])
Test Cases Total
Passed 0 0
Failed 1 1
Total 1 1
(I assumed the bug is fixed on master/HEAD, so I checked out the changeset you quoted earlier.)
I also ran all other properties with 1 500 000 tests, and I only get errors on "dependent ==> dependent'".
As you can see above, I had to actually change the source code to configure the number of tests. However there are a few command line switches on dejafu-tests
that could potentially be used by the leancheck test-framework/hunit test driver to make it easier to activate an "extensive test" mode.
$ ./dist/build/dejafu-tests/dejafu-tests --help
Usage: dejafu-tests [OPTIONS]
--help show this help message
-j NUMBER --threads=NUMBER number of threads to use to run tests
--test-seed=NUMBER|random default seed for test random number generator
-a NUMBER --maximum-generated-tests=NUMBER how many automated tests something like QuickCheck should try, by default
--maximum-unsuitable-generated-tests=NUMBER how many unsuitable candidate tests something like QuickCheck should endure before giving up, by default
-s NUMBER --maximum-test-size=NUMBER to what size something like QuickCheck should test the properties, by default
-d NUMBER --maximum-test-depth=NUMBER to what depth something like SmallCheck should test the properties, by default
-o NUMBER --timeout=NUMBER how many seconds a test should be run for before giving up, by default
--no-timeout specifies that tests should be run without a timeout, by default
-l --list-tests list available tests but don't run any; useful to guide subsequent --select-tests
-t TEST-PATTERN --select-tests=TEST-PATTERN only tests that match at least one glob pattern given by an instance of this argument will be run
--jxml=FILE write a JUnit XML summary of the output to FILE
--jxml-nested use nested testsuites to represent groups in JUnit XML (not standards compliant)
--plain do not use any ANSI terminal features to display the test run
--color use ANSI terminal features to display the test run
--hide-successes hide sucessful tests, and only show failures
The switches are -a
, -s
or -d
. To make it easy to try more tests with LeanCheck, I could try to implement the use of these switches (by reading the global hunit/test-framework config), but I am not familiar with hunit/test-framework to do it on the spot. Maybe I'll do that one day if I get the time.
I have added a mapCons
function to LeanCheck. You can use it to build a Listable instance for maps like so:
instance (Ord k, Listable k, Listable v) => Listable (Map k v) where
tiers = mapCons M.fromList
It avoid repetitions. So counterexamples should appear earlier (an in fact they do).
Using your original Listable definitions for Set and Map, your quoted counterexample appears as the 1 340 809 th test:
> check prop
*** Failed! Falsifiable (after 1340809 tests):
(DepState {depCRState = fromList [], depMVState = fromList [], depMaskState = fromList []}) 0 0 (BlockedSTM [TWrite 0]) (STM [TRead 0] [])
After changing the definitions to use setCons
and mapCons
, it appears as the 842 017 th test:
> check prop
*** Failed! Falsifiable (after 842017 tests):
(DepState {depCRState = fromList [], depMVState = fromList [], depMaskState = fromList []}) 0 0 (BlockedSTM [TWrite 0]) (STM [TRead 0] [])
Using these improved instances and setting LeanCheck to run (¿just?) a million tests is sufficient to find the bug.
I plan to release a new version of LeanCheck shipped with mapCons
soon. For now, it is found on the master branch.
You may remember me saying a while ago that I added some LeanCheck tests to the dejafu testsuite, and they actually found bugs, whereas QuickCheck did not. I took that as fairly strong evidence that random value generation did not work well in that particular case.
I've just been experimenting with converting those tests to use Hedgehog, as I added some other Hedgehog tests and it seemed silly to depend on three different property testing libraries. I expected Hedgehog to do as poorly as QuickCheck, so imagine my surprise when Hedgehog found a bug triggered by a small counterexample, which LeanCheck didn't!
Here's the property and the counterexample found by Hedgehog:
And indeed it is a bug. In this function, I am missing a case
(BlockedSTM _, WillSTM) -> True
.The generated values all seem pretty small to me:
DepState [] [] []
,0
,0
,BlockedSTM [TWrite 0]
,STM [TCatch [TRead 0] Nothing] []
Here's the LeanCheck property. Are the
Listable
instances formulated poorly? Or could it be because the LeanCheck version of the property uses(==>)
, which doesn't retry with new input if the predicate fails; whereas Hedgehog does?