Make some generators more efficient by using splitmix primitives.

phadej commented 4 years ago

SplitMix generates uniform Word64, by using this knowledge we can generate numbers more efficiently.

bitmaskWithRejection64' n behaves as choose (0, n), but is more efficient.

In a small tests-suite generating list of characters, the speedup is noticeable (max-size is 1000):

CharSet
  fromList . toList:     OK (23.73s)
    +++ OK, passed 100000 tests.
  member:                OK (22.50s)
    +++ OK, passed 100000 tests.
  union:                 OK (49.02s)
    +++ OK, passed 100000 tests.
  union: left identity:  OK (22.99s)
    +++ OK, passed 100000 tests.
  union: right identity: OK (22.81s)
    +++ OK, passed 100000 tests.
  union: commutativity:  OK (48.55s)
    +++ OK, passed 100000 tests.
  union: associativity:  OK (76.98s)
    +++ OK, passed 100000 tests.

speeds up to:

CharSet
  fromList . toList:     OK (17.49s)
    +++ OK, passed 100000 tests.
  member:                OK (16.31s)
    +++ OK, passed 100000 tests.
  union:                 OK (36.63s)
    +++ OK, passed 100000 tests.
  union: left identity:  OK (17.06s)
    +++ OK, passed 100000 tests.
  union: right identity: OK (16.79s)
    +++ OK, passed 100000 tests.
  union: commutativity:  OK (36.23s)
    +++ OK, passed 100000 tests.
  union: associativity:  OK (58.36s)
    +++ OK, passed 100000 tests.

phadej commented 4 years ago

This is WIP, I noticed that arbitrarySizedBoundedIntegral is not uniform, and have to think how to make Word64 etc generation fast yet take that into account too.

(coincidentally, because of non uniform generation, test-quickcheck-generators runs faster without this change).

EDIT: so now only changes how Char is generated.

phadej commented 4 years ago

arbitraryWord64 is not exported; it generates uniform Word64. I think we should export it (and remember to add @since annotation).

phadej commented 4 years ago

ping

nick8325 commented 4 years ago

Those are nice speedups!

I had a go at making a wrapper around bitmaskWithRejection64, so that this can be used as a replacement for choose. I ended up with this:

chooseInt :: (Int, Int) -> Gen Int
chooseInt (lo, hi) = do
  w <- chooseUpTo (fromIntegral hi - fromIntegral lo)
  return (fromIntegral (w + fromIntegral lo))

chooseUpTo :: Word64 -> Gen Word64
chooseUpTo n =
  MkGen $ \(QCGen g) _ ->
    -- We need to special-case n == maxBound, but may as well do it for all powers of two
    if n .&. (n+1) == 0 && n /= 0 then
      fst (nextWord64 g) .&. n
    else
      fst (bitmaskWithRejection64 (n+1) g)

When I use the previous definitions of arbitrary{Unicode,ASCII}Char but using chooseInt instead of choose, it seems to be a tiny bit faster than your patch. So it seems like a good idea to add something like this to QuickCheck.

However, there is a whole load of code out there that currently calls choose. It would be nice if they could get the benefits of this speed-up too, but I can't think of a good way to do it (without breaking lots of existing code). Any ideas?

phadej commented 4 years ago

@nick8325 I see. ~~Looks like I have a bug in bitmaskWithRejection64' which affects performance.~~

EDIT, no I don't. At least not what I thought about.

phadej commented 4 years ago

@nick8325 is your chooseUpTo faster than bitmaskWithRejection64' (with a prime, which picks from closed-closed interval).

If it so, I'll rather use that in splitmix.

I suspect that in case of arbitraryASCIIChar the resulting GHC Core should be very close, though bitmaskWithRejection64' would do an extra > comparison (to reject overflows, which is never true).

About choose, there seems to be some work to make random not cripple underlying generators in https://github.com/idontgetoutmuch/random/pull/1, so I wouldn't try to do anything like that in QuickCheck.

Something which would benefit fromchooseInt like thing is arbitrarySizedBoundedIntegral, generating bare numbers is I common. I didn't manage to get it right though when I tried. (There could be a variant of arbitrarySizedBoundedIntegral when we know that the range is exactly 2 ^ n).

nick8325 commented 4 years ago

Ah, I hadn't noticed bitmaskWithRejection64' before (I was looking at an old version of splitmix, apparently). Things are nice and fast when I just use that directly.

But there is a performance bug in the unprimed bitmaskWithRejection64: when you call it with range a power of two, the mask is 1 bit longer than what's actually needed. For example, bitmaskWithRejection64 1024 uses mask = 2047 when mask = 1023 would suffice. So, half the words generated would be rejected. This could certainly explain the performance difference.

I also tried implementing chooseInteger :: (Integer, Integer) -> Integer, which uses bitmaskWithRejection64 if the arguments are within the range of a Word64 and falls back to choose otherwise. This is slightly slower than chooseInt, but not by much (rough figures from a benchmark I ran: 15s using chooseInt, 16s using chooseInteger, 30s using choose).

I think a simple way we could do this is to add chooseInt and chooseInteger, and just go through QuickCheck and replace each call to choose with one of them where possible. That should also handle arbitrarySizedBoundedIntegral. Hopefully, this will even become unnecessary once the improvements to random get released.

About handling power of two ranges specially - I originally marked chooseUpTo INLINE for that reason (if the range is a power-of-two constant then the inlined code simplifies to a call to nextWord64 and a bitwise AND), but it didn't seem to help performance. If we statically know that the range is a power of two then potentially we can save a comparison and a call to countLeadingZeros, but perhaps this isn't very significant. We can measure of course.

nick8325 commented 4 years ago

I just pushed a patch that adds chooseInt, chooseInteger, chooseBoundedIntegral and chooseEnum functions, which have the same API as choose but use bitmaskWithRejection64' if possible (this means, if the lower and upper bounds of the range fit inside an Int64). I replaced all calls to choose inside QuickCheck with one of these functions.

On my machine, the time taken to generate 100000 random strings goes down from 56s to 11s! Also, the tests in the 'examples' directory run twice as fast, perhaps because oneof now uses chooseInt instead of choose.

The definition of arbitrarySizedBoundedIntegral still goes through Integer, so there may be room for speedups here. Also, generating a uniform Word64 will currently invoke choose, because it doesn't fit inside Int64.

Could you perhaps try your benchmarks with the new patch and report back?

nick8325 commented 4 years ago

Oh, looks like I broke something. Hold on...

phadej commented 4 years ago

Int64 and Word64 are exactly the same bitwise. I don't understand

Also, generating a uniform Word64 will currently invoke choose, because it doesn't fit inside Int64.

nick8325 commented 4 years ago

chooseBoundedIntegral (lo, hi) checks that both lo and hi are in the range of an Int64 (and otherwise reverts to calling choose). So chooseBoundedIntegral (minBound, maxBound) :: Gen Word64 will fail that check because maxBound :: Word64 is outside the range of an Int64.

Can obviously be fixed, just needs a bit more programming...

phadej commented 4 years ago

@nick8325 I see. This is exactly why I didn't change it myself. Thinking about over/undeflows is tricky, as type-system doesn't help there.

nick8325 commented 4 years ago

It seems that arbitrarySizedBoundedIntegral is indeed still slow. I think it can be fixed, but I'll need to think a bit more...

nick8325 commented 4 years ago

(But arbitraryBoundedIntegral on Word64 is fast now, at least...)

nick8325 commented 4 years ago

OK, now arbitrarySizedBoundedIntegral is decently fast too!

phadej commented 4 years ago

@nick8325 great!

In the test-suite which prompted this patch (Char heavy) whole test-suite running time went down from roughly from 7.5s to 3.5s using current master 9f678a938325f2ce113c46b1123bf6c51fd825ad of QuickCheck. This is very nice!

Few distributions (with label) I print out looks similar.

Do you want me do something additional?

nick8325 commented 4 years ago

No, that's fine! I'll make a new release of QuickCheck soon.

nick8325 / quickcheck

Make some generators more efficient by using splitmix primitives. #287