nick8325 / quickspec

Equational laws for free
BSD 3-Clause "New" or "Revised" License
249 stars 24 forks source link

Space leak in v1 (from Hackage) #14

Open Warbo opened 7 years ago

Warbo commented 7 years ago

I know v1 is old news, so I'm mostly posting this for posterity and don't mind if it gets closed as "won't fix".

QuickSpec v1, e.g. 0.9.6 from Hackage, has a pretty severe space leak, eating up GBs of RAM for very simple theories, like the following:

data Nat = Z | S Nat

z = Z
s = S
p (S x) = x

plus Z y = y
plus (S x) y = S (plus x y)

times Z y = Z
times (S x) y = plus y (times x y)

exp x Z = S Z
exp x (S y) = times x (exp x y)

max Z y = y
max (S x) Z = S x
max (S x) (S y) = S (max x y)

From what I can tell, the problem is exp building huge unary Nat values. From what I can tell, this shouldn't be a problem due to laziness: if we have x :: Nat and y :: Nat, we should be able to compare them with == or <= in constant space: if the outer constructors differ, return an appropriate Bool; if they're both S then perform a tail call to the inner thunks and garbage-collect the outer ones.

I dug into this a little, but gave up. My methodology was to build everything with profiling support, and run with +RTS -M100m -xc; this causes an out-of-memory exception when the heap reaches 100MB, and prints out a stack trace of the exception which helps to narrow down the problem. I found the following:

In Test.QuickSpec.Generate.generateTermsSatisfying there is a space leak on the following line:

quietly $ printf "%d terms, " (count sum length ts)

As far as I understand it, ts is a list built out of the nodes/leaves of the test tree up to the current depth. Getting the length forces this whole chunk of the test tree to be evaluated. Since the test tree is also used later on to do the actual exploration, it can't be garbage collected and hence leaks space.

If the shutUp option is given to this function, such messages are skipped, so the leak disappears. Unfortunately the Test.QuickSpec.Main module seems to hard-code shutUp to be False. I just deleted these lines to fix the problem (the Depth ... lines are fine; except they're sent to stdout when they should really go to stderr).

The next space leak showed up in Test.QuickSpec.TestTree.test'. It looks like the problem is partitionBy, which forces elements in order to sort them. The definition of partitionBy turns the given list into a list of pairs, with the original elements in the fst position and the processed elements in snd position; this list is sorted by the snd position, grouped by the snd position, then the fst values are extracted.

I tried changing this to avoid having these complete listsin memory at once. Rather than pairing up all elements with their processed version, I tried pushing the value function into the comparison function, so we'd take in two normal elements, send them through value and compare the results. This trades space for time: we don't have to store the results of running value on every element, but we do have to run value over and over as we compare.

This didn't seem to help; I imagine it's the input list itself that's too big. A further optimisation might be to push the evaluation of these list elements into the comparison function: accept values x and y from the list, clone their thunks, evaluate those thunks, send them through value and compare the results. This way, the thunks in the list will remain unevaluated, and the cloned thunks can be garbage-collected after each comparison. I gave up at this point because:

nick8325 commented 7 years ago

Yikes, that's awkward that just printing the statistics causes a space leak...

One trick you could try is using observational equality instead of Ord. For example, you can define an observer function for Nat that simply converts it into an integer. I think the Nats will then not be stored in memory.

In general, you can do this trick for any datatype by giving a hash function as the observer function.