RFE: don't re-test duplicate inputs

nick8325 / quickcheck

Automatic testing of Haskell programs.

Other

713 stars 119 forks source link

RFE: don't re-test duplicate inputs #272

Closed symbiont-sam-halliday closed 4 years ago

symbiont-sam-halliday commented 4 years ago

When writing a shrinker it is impossible to know which inputs have been previously attempted, and since forAllShrink has no notion of Eq a or Ord a it is not able to deduplicate.

It would be really efficient if there was an alternative forAllShrink that had an Ord a requirement and could therefore avoid running duplicate shrinks, making shrinking much faster for complex minimisations.

Note: I don't mean just running nub on the result of shrink. I mean checking the result of shrink against the list of everything that has already been attempted in order to determine if it succeeds or fails without actually doing the work again.

Lysxia commented 4 years ago

What kind of data are you trying to shrink that way? The problem doesn't seem to apply to tree-shaped data, even pairs (assuming the individual components don't have that problem in the first place), or am I missing something? That leaves primitives like numbers. Is the duplication actually significant?

Rather than a new shrinking strategy based on a shrink :: a -> [a] function, another solution is to use a binary tree of shrunk values, so that we go left if the current shrink passed the test, or we go right if it failed: that is expressive enough to let users encode any shrinking strategy, which may also keep the opportunity open for something more efficient than keeping track of a set of values that have been tried...

symbiont-sam-halliday commented 4 years ago

that's an interesting approach!

The kind of data I'm trying to shrink is too complex to explain in detail here (and also proprietary), it's a domain model containing a list of actions and there are only certain moves that I can make to transform those actions into other actions that keep the list valid.

The kind of problem I'm coming up against is perhaps described in this much simplified example: imagine we have a list of actions

data Action = A | B | C | D

and we want to shrink [Action] -> [[Action]] at every step.

One rule I might apply is to remove one A, one B, etc. There might be more complex rules such as C can only be removed from the end, or only after Ds or somesuch.

This means we end up with duplicate tests because I might reach a node in the shrink graph by removing the first A then removing the first B. Or I could reach exactly the same state by removing the first B then removing the first A. Does that help explain?

kosmikus commented 4 years ago

For tricky scenarios, QuickCheck provides a way to write more sophisticated shrinkers using Shrinking / ShrinkState. Would that help in your situation?

symbiont-sam-halliday commented 4 years ago

I was not aware of this, thanks! https://hackage.haskell.org/package/QuickCheck-2.13.2/docs/Test-QuickCheck.html#t:Shrinking

I haven't used it yet, but it seems to allow me to do exactly what I want: do a lookup to make sure I don't produce shrinks that have previously been tested.

I am trying my best to avoid using Arbitrary, and instead manually passing generators and shrinkers. Not a terribly big deal, but it would be nice if there was a way to use it that way, e.g. a forAllShrinking.

Thanks! I'll reopen if that doesn't work out for me.

symbiont-sam-halliday commented 4 years ago

I tried to use this today and was reminded that I can't use Arbitrary because my Gen functions require a bunch of input context that I can't inject if I'm using the Arbitrary interface. I'll create a fresh ticket to be able to use Shrinking without the Arbitrary machinary.

symbiont-sam-halliday commented 4 years ago

I created https://github.com/nick8325/quickcheck/issues/273