Open OlivierSohn opened 6 years ago
That's interesting idea although I don't know anything about state of SIMD in GHC. AFAIR they never landed properly but I may be mistaken. It raises two interesting problems
Do we have SIMD? Since function is small we can get away by writing everything by hand.
Say we have SIMD and vectorized generator. Now what should we do with 4 Word32? This question is important even if we don't have vectorized generator. For example is we generate Word8 we can generate 4 at time! And 32 for Bools.
Do we have SIMD? Since function is small we can get away by writing everything by hand.
I'm not sure to understand, what do you mean? If we don't have SIMD (or another vectorization instruction set) available, I don't see how we can vectorize by hand?
Say we have SIMD and vectorized generator. Now what should we do with 4 Word32?
In the spirit of foldMUniforms
in #66 we could have an API where we provide a stream of values.
This question is important even if we don't have vectorized generator. For example is we generate Word8 we can generate 4 at time! And 32 for Bools.
Yes, maybe it would make sense as you say, even without vectorization, to have an API that provides a stream of values. This was the idea I was trying to address in #66, although I made it only for Word32.
RE SIMD support. I meant that we may just have bunch of rather low-level primitives in GHC.Whatever
where one need to do everything by hand without any compiler support.
And about API. I'm thinking about making fork of mwc-random
to experiment with API and possibly squeeze a bit of performance. mw-random
is relatively popular so API couldn't be changed will-nilly.
Using a state vector containing the information of 4 seeds, we could provide a new API where we get 4 random numbers at a time (using SIMD instructions for example).
The state vector would have to be "interleaved" to optimize cache usage, so that when reading or writing 4 elements, we read or write contiguous memory.
(I use the number 4 but it could be a different number depending on the Variate type (to build a double we need 2 Word32), and on the vectorization primitives available.)
I'm not familiar with haskell vector primitives but looking at GHC.Exts I saw some there, so I guess it is possible to go this way... What do you think?