mtedone / podam

PODAM - POjo DAta Mocker
https://mtedone.github.io/podam
MIT License
326 stars 749 forks source link

PodamUtils.getIntegerRange() produces non-uniformly distributed values #236

Closed Otanikotani closed 7 years ago

Otanikotani commented 7 years ago

Code, it simply counts frequencies of random values from 0 to 2 inclusively.

    @Test
    public void test_() throws Exception {
        Map<Integer, Integer> counters = new HashMap<>();
        counters.put(0, 0);
        counters.put(1, 0);
        counters.put(2, 0);
        for (int i = 0; i < 10_000_000; i++) {
            int result = PodamUtils.getIntegerInRange(0, 2);
            Integer counter = counters.get(result) + 1;
            counters.put(result, counter);
        }
        System.out.println(counters);
    }

Result:

{0=2500466, 1=5000374, 2=2499160}

Is it expected behaviour? For range from 0 to 10 same code generates:

{0=500043, 1=999784, 2=1001485, 3=1001217, 4=998733, 5=999781, 6=1000986, 7=999542, 8=998796, 9=999932, 10=499701}

Which looks a little bit weird and not fair!

daivanov commented 7 years ago

Yes, this is an effect of rounding nobody was paying attention to earlier. Not sure what is an expected behaviour here, but I could imagine it might affect testing of marginal values.

daivanov commented 7 years ago

Snapshot 7.0.5-2.

Otanikotani commented 7 years ago

@daivanov May I ask you why implementation

new Random().nextInt((max - min) + 1) + min;

Is not used? Why is it important to add and substract 0.5?

daivanov commented 7 years ago

And why should it be used? ;)

I did a quick profiling and your code takes 43703 ms per billion iterations, and current implementation does it in 20566 ms, which is 53% faster.

Otanikotani commented 7 years ago

Well, did you always create a new instance of Random?

daivanov commented 7 years ago

Sure, because it was part of the question.

Otanikotani commented 7 years ago

Benchmark:

public class Randoms {

    @Benchmark
    public Integer current() {
        return PodamUtils.getIntegerInRange(0, 1000);
    }

    @Benchmark
    public Integer rangeFromSo() {
        return PodamUtils.getIntegerRangeDefaultVersionFromStackOverflow(0, 1000);
    }

    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
                .include(Randoms.class.getSimpleName())
                .forks(1)
                .build();

        new Runner(opt).run();
    }
}

Results:

Benchmark             Mode  Cnt         Score        Error  Units
Randoms.current      thrpt   20  35493740.187 ± 304428.342  ops/s
Randoms.rangeFromSo  thrpt   20  75697058.949 ± 554676.207  ops/s

Where methods are like this:

    public static int getIntegerInRange(int minValue, int maxValue) {

        return (int) (getDoubleInRange(minValue - 0.5, maxValue + 0.5 - (1 / Integer.MAX_VALUE)) + 0.5);
    }

    public static int getIntegerRangeDefaultVersionFromStackOverflow(int minValue, int maxValue) {
        return RNG.nextInt((maxValue - minValue) + 1) + minValue;
    }

and RNG is static final variable.

Otanikotani commented 7 years ago

But why would you measure performance for code which is not optimized? Of course it would lose if you create an instance every time. The point of that version was to show that it is simply more readable without casts, division and strange values like 0.5.

daivanov commented 7 years ago

And can you do this for long?

Otanikotani commented 7 years ago

It is possible to do without implementing your own version in Java 7 or using apache commons math.

daivanov commented 7 years ago

But you also should tell why things should be done. People should know motivational part of the proposal.

daivanov commented 7 years ago

I will resolve this issue, if you don't mind.

Otanikotani commented 7 years ago

Yea, sure, it is all only speculative, I don't think that normalized distribution or performance of random generator is that important anyway.

daivanov commented 7 years ago

The distribution is not normal, now it is uniform. Performance is of course important, but it's not easy to improve it.

daivanov commented 7 years ago

Released in Podam 7.0.5