Closed achan001 closed 6 years ago
Why variable 'leftover' able to reject unbiased randoms ?
Uh, I meant to reject biased randoms
random32bit = pcg32_random();
multiresult = random32bit * range;
leftover = (uint32_t) multiresult;
return multiresult >> 32; // [0, range)
Returned result look just like your modulo-reduction trick.
As you mentioned in the blog, above code work very well, especially for small range. So, the question is how much the added sample rejection code contributed.
variable 'leftover' seems like a random number (MCG) to me ...
So, the question is how much the added sample rejection code contributed.
The question we ask is "given that we have a perfect source of random numbers, can you prove mathematically that the ranged result is unbiased". There is no "how much" question in this context. Results are either biased or unbiased.
A separate question is whether the bias matters. Well. That depends, doesn't it? Some applications might be super sensitive to biases, others might not care.
So, the question is how much the added sample rejection code contributed
I phrase it that way because the code will generate bias, even with true random source. Even your comment for pcg32_random_bounded_divisionless() say so. (shuffle.c, line 55)
// map random value to [0,range) with slight bias, redraws to avoid bias if needed static inline uint32_t pcg32_random_bounded_divisionless(uint32_t range) { ...
So, the code will return random numbers with slight bias. Sample rejection code contributed to the benefit of redrawing.
I don't follow the criteria of redrawing, and how much it contribute. The code implied that is a line between acceptable and unacceptable bias It is unclear where the line is ...
I made two programs to show that the code seems to work. This just fills the slots using modulo-reduction trick
#include <stdio.h> // ranged_slot.c
#include <stdlib.h>
int main(int argc, char **argv)
{
unsigned range = strtoul(argv[1], NULL, 10);
int slot[range];
for(int i=0; i<range; i++) slot[i] = 0;
unsigned rand = 0;
unsigned long long rand64 = 0;
do {
slot[(int)(rand64 >> 32)]++;
rand64 += range; // rand64 = rand * range
} while (++rand); // all 32-bits randoms
for(int i=0; i<range; i++) // show the slots
printf("slot[%d] = %d\n", i, slot[i]);
return 0;
}
This showed what got rejected
#include <stdio.h> // ranged_reject.c
#include <stdlib.h>
#define SCALED(x, n) (unsigned) (((unsigned long long) x * n) >> 32)
int main(int argc, char **argv)
{
unsigned range = strtoul(argv[1], NULL, 10);
unsigned threshold = -range % range;
unsigned rand = 0, leftover = 0;
printf("leftover < (threshold = %u):\n", threshold);
do {
if (leftover < threshold)
printf("0x%08x --> %u\n", rand, SCALED(rand, range));
leftover += range; // leftover = (rand * range) & bit32
} while (++rand); // all 32-bits randoms
return 0;
}
For range = 6, all slots (after sample rejection) are equally likely
C:\tmp>ranged_slot 6
slot[0] = 715827883
slot[1] = 715827883
slot[2] = 715827882
slot[3] = 715827883
slot[4] = 715827883
slot[5] = 715827882
C:\tmps> ranged_reject 6
leftover < (threshold = 4):
0x00000000 --> 0
0x2aaaaaab --> 1
0x80000000 --> 3
0xaaaaaaab --> 4
Given your last comment, I think I now understand your point.
Ok. So you get a biased return...
>ranged_slot 6
slot[0] = 715827883
slot[1] = 715827883
slot[2] = 715827882
slot[3] = 715827883
slot[4] = 715827883
slot[5] = 715827882
Some values are more likely than others. As I point out in my blog post, the difference is at most one between frequencies, so for small ranges, it is a "slight" bias.
Try this program (with rejection)...
#include <stdio.h> // ranged_slot.c
#include <stdlib.h>
int main(int argc, char **argv)
{
uint32_t range = strtoul(argv[1], NULL, 10);
uint32_t slot[range];
for(int i=0; i<range; i++) slot[i] = 0;
uint32_t rand = 0;
uint64_t rand64 = 0;
uint32_t threshold = -range % range;
do {
if((rand64 & 0xFFFFFFFF) >= threshold) slot[(uint32_t)(rand64 >> 32)]++;
rand64 += range; // rand64 = rand * range
} while (++rand); // all 32-bits randoms
for(int i=0; i<range; i++) // show the slots
printf("slot[%d] = %d\n", i, slot[i]);
return 0;
}
We get an unbiased result...
$ ./ranged_slot 6
slot[0] = 715827882
slot[1] = 715827882
slot[2] = 715827882
slot[3] = 715827882
slot[4] = 715827882
slot[5] = 715827882
You seem to want me to tell you when a bias is ok. I don't have an answer to this question.
I'll close this, reopen if I have not answered to your satisfaction.
With your revised slots + rejection program, 6 slots are all equaly likely.
Is it always true ? (all 32-bits range)
Brute force to prove it always true probably take too long. Besides, it will not explain WHY and HOW it work.
I understand that rejected cases count is same as pcg32_random_bounded() (leftover = (rand * range) >>32, is a bijection of rand, rejection rate = threshold / 2^32)
But, why the rejected cases happened to be the slots with excess counts ? It probably is not an accident.
Explanation is still welcome.
With your revised slots + rejection program, 6 slots are all equaly likely.
Yes.
Is it always true ? (all 32-bits range)
Yes.
Brute force to prove it always true probably take too long.
Definitively.
Besides, it will not explain WHY and HOW it work.
Indeed. I am working on a short paper right now. It is not yet ready. I will publish it soon.
I understand that rejected cases count is same as pcg32_random_bounded() (leftover = (rand * range) >>32, is a bijection of rand, rejection rate = threshold / 2^32)
Yes. All these algorithms rely on the same math.
But, why the rejected cases happened to be the slots with excess counts ? It probably is not an accident.
Indeed, that is no accident. It is the desired result.
Explanation is still welcome.
Email me at lemire@gmail.com and I will share the draft of my paper.
// map random value to [0,range) with slight bias, redraws to avoid bias if needed
You might consider changing above comment (split.c, line 55) I had mis-read it to imply the code accept slight bias, redraw if bias unacceptable
This might be better:
// map random value to [0,range) without bias (by redraws if needed)
You are correct. Done.
I think I can visualize why the code removed all bias. (Note: not a proof)
After threshold values are rejected, remaining values are divisible by range.
So, if reject correctly picked threshold values, the result have no bias.
The reason it reject the slot with excess count is because all rejected cases
are all very very close to edge of 2 slots. It almost seems as if it could fall to
the wrong slots. Edge cases just barely able to squeeze into that particular slot.
So, if we have to reject anything, pick the edge cases.
After threshold values are rejected, remaining values are divisible by range.
That's the gist of the proof.
Thanks.
Since possible values are evenly spaced, the slot with the edge case allow most room for other values.
Uh, I think I just proved it ...
I think you did, indeed.
To confirm my edge case idea valid, I tried top edge instead of bottom
uint32_t unbiased = ~(-range % range); // unbiased = ~threshold
while (leftover > unbiased) ... // rejected same slots as leftover < threshold
For unknown reason, top edge rejection sometimes run faster
range_reject.c (post 5) , range = 6:
speedup = 5.393 sec / 3.595 sec = 1.501X, or 50% faster
It seems unlikely that you'd see a 50% speed difference. I will start with the usual warning: never benchmark on a laptop... always benchmark on a server configured for testing.
I cannot reproduce your speed difference. Try this gist:
https://gist.github.com/lemire/0ead15045a4c174799338b231bacf199
$ gcc -O3 -o isflipped isflipped.c
$ ./isflipped
initbase(buffer,len,range) : 4.799 cycles per operation (best) 4.875 cycles per operation (avg)
flippedbase(buffer,len,range) : 4.790 cycles per operation (best) 4.799 cycles per operation (avg)
$ ./isflipped
initbase(buffer,len,range) : 4.798 cycles per operation (best) 4.874 cycles per operation (avg)
flippedbase(buffer,len,range) : 4.793 cycles per operation (best) 4.799 cycles per operation (avg)
$ clang -O3 -o isflipped isflipped.c
$ ./isflipped
initbase(buffer,len,range) : 6.029 cycles per operation (best) 6.030 cycles per operation (avg)
flippedbase(buffer,len,range) : 6.029 cycles per operation (best) 6.051 cycles per operation (avg)
$ ./isflipped
initbase(buffer,len,range) : 6.029 cycles per operation (best) 6.030 cycles per operation (avg)
flippedbase(buffer,len,range) : 6.029 cycles per operation (best) 6.030 cycles per operation (avg)
ranged_reject.c (post 5) does not have a PRNG (not even slots!)
With cost of PRNG added, the effect of top edge vs bottom edge is just noise.
My goal is to confirm top edge rejected the same slots ... it does ! The speedup is just a curious observation (w/ strawberryperl.com supplied gcc 7.1.0)
Your pre-screening test for pcg32_random_bounded_divisionless_flipped is wrong
Pre-screened cases must include all the top edge cases.
It should be like this:
if (leftover > ~range) ... // pre-screening
If range > 0, above can be optimized a bit more:
if (leftover > -range) ... // pre-screening, range > 0
Your pre-screening test for pcg32_random_bounded_divisionless_flipped is wrong
Right. It still does not affect the running speed.
With cost of PRNG added, the effect of top edge vs bottom edge is just noise.
Makes sense!
You might consider adding this top edge case to your unfinished draft.
It showed that the threshold rejection test is just picking the edge cases.
Which edge cases to pick does not matter (as long as it is the same side)
@achan001 Mathematically, this needs to be pointed out, but there is no reason to have two distinct implementations unless one has benefits, somehow.
Sorry for the delay, I posted the reference tonight:
Thank you.
I peek at the code pcg32_random_bounded_divisionless().
Why variable 'leftover' able to reject unbiased randoms ?
No bias if leftover >= range or leftover(updated) >= threshold.
I don't follow above logic ...
Does this produce unbiased ranged number ?
Or, is it an approximation, using your idea in another blog:
https://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/