Open rrnewton opened 8 years ago
Yes, pcg-random. Specifically the System.Random.PCG.Fast.Pure
module where Gen
is a MutableByteArray
. I also changed this line to bump the bytes allocated to 48 rounding the whole MutableByteArray
allocated to one 64-byte cache-line on my 64-bit machine to avoid contention between PRNG's allocated together. Without that I observed bad performance in the wild.
Is that one line patch linked there look correct vis a vis what you said?
That looks right to me. newByteArray
allocates an StgArrWords
which has a StgHeader
(8 bytes) then an StgWord
(8 bytes) then the number of bytes requested (48 bytes) giving a total of 8+8+48=64
bytes. So the 8 bytes that PCG uses will always be on a different cache-line then another PCG (modulo associativity).
Oops... inside the monad-toss business there is a performance bug. It uses a very slow RNG.
@fryguybob recommended ... which was it? pcg-random? Was that the one that offered good parallel scalability.
I think there was also another perf bug we discussed at ICFP, but I've forgotten it now.