Open mattfidler opened 6 years ago
This would reduce the probability of having similar seeds for each core, though with the size of the uint32 and the number of cores people typically have, this still would be low.
Hi Matt, thanks for pointing this out. I wonder whether the best option would be depending directly on the sitmo package which might be setting the seed in a more sophisticated way (I haven't checked). If we believe the first approximation to the birthday problem given in Wikipedia, than the probability of a collision when using a 1000 cores is around
1 - exp(-1000^2/(2*(2^32)))
[1] 0.0001164085
with a more realistic 64 cores we have O(10^-7) probability of collision. I'll think about it.
I looked at the examples in the sitmo package and they have all the seeds specified by the user for each core. I don't think they actually do anything for this problem. I could be wrong.
As I said, the probability is low. However, the probability with 1000 cores using the sequential solution is 0.
Wouldn't the probabily increase as you make multiple calls to the mvnfast? It generates a seed from the uniform distribution every time it starts the mvnfast number generation. Still be low, but it isn't only based on the number of threads.
Hi Matt,
ok, hopefully this 5094d5a25f00f505e0f472b6d2990a1f218db099 addresses your concerns.
Indeed it is better. Now it only depends on how many times you call mvnfast.
The only way to completely overcome it is to create your own seed setting routine or reading the R seed's state and using the R seed (I think that would work...?)
According to https://github.com/wch/r-source/blob/3105967d568ce6e411c63a0cbecabaec695b5cc0/src/library/base/man/Random-user.Rd
You could read the seed state by:
user_unif_nseed
and using the R's seed as sitmo's seed. You have to advance this seed somehow probably by a throw-away uniform random number.
If you are interested in this approach, it wouldn't be too hard to adapt...?
Also, I have heard that the construct
for (unsigned int j = ncores-1; j--;){
seeds[j] = seeds[j+1]-1;
if (seeds[j] == 0) seeds[j] = std::numeric_limits<uint32_t>::max()-1;
}
Takes slightly less operations than a full for loop, and therefore should be slightly faster. On modern machines, it shouldn't matter too much, but that is why I chose this loop style.
Now the reference:
https://www.thegeekstuff.com/2015/01/c-cpp-code-optimization/
Item 7 talks about the for-loop optimization above.
Interesting problem. One alternative could be to use the same seed value on all cores but pre-set the internal counter. For example, in dqrng
I use
template<>
inline void random_64bit_wrapper<sitmo::threefry_20_64>::seed(result_type seed, result_type stream) {
gen.seed(seed);
gen.set_counter(0, 0, 0, stream);
cache = false;
}
This way I get 2^64 streams with 3 64bit integers per-stream for counting.
I do something similar for parallel solving in rxode2
I was looking at your code, and noticed
Have you considered https://www.johndcook.com/blog/2016/01/29/random-number-generator-seed-mistakes/
Perhaps instead:
While it may not be quite as fast as your solution, it can avoid the birthday problem