Poor numeric domain coverage when using default arbitraries

rkraneis commented 6 years ago

Testing Problem

Poor numeric domain coverage when using default arbitraries. The property evenNumbersAreEvenAndSmall is not falsified by Jqwik 0.8.4 using the default jqwik.poperties in the tests below:

public class JqwikTests {

    public static final Condition<Long> EVEN = new Condition<>(x -> x % 2 == 0, "even");
    public static final Condition<Long> SMALL = new Condition<>(x -> x < 2000, "< 2000");

    @Provide
    Arbitrary<Long> evenNumbers() {
        return Arbitraries.longs().filter(l -> l % 2 == 0);
    }

    @Property
    void evenNumbersAreEven(@ForAll("evenNumbers") long evenNumber) {
        assertThat(evenNumber)
                .is(EVEN);
    }

    @Property
    void evenNumbersAreEvenAndSmall(@ForAll("evenNumbers") long evenNumber) {
        System.out.println(evenNumber);
        assertThat(evenNumber)
                .is(EVEN)
                .is(SMALL);
    }
}

-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running JqwikTests
0
-9223372036854775808
8
-468
-214
284
186
... many more (993? :-)) numbers, but all roughly in the region (-500, 500)

The other tested frameworks (JunitQuickCheck, QuickTheories, VavrTest) are able to falsify this using default settings but have unusable or no shrinking.

Manually setting the range to (Long.MIN_VALUE + 1, Long.MAX_VALUE - 1) solves this as does increasing the number of tries.

Discussion

Documenting does not need any further discussion I guess :-)

Increasing the test domain while keeping the number of tries obviously spreads out the sample data which might be wanted--or not.

jlink commented 6 years ago

Documenting the range in which random values are generated is surely necessary. I'll put that on top of my todo list.

There is one fundamental question, though, for which I haven't found a good answer yet: How do you determine a good domain range for numeric types? There are a two opposing forces:

Making the range small leads to higher coverage in the (supposedly more common) area of smaller numbers, but misses out on higher numbers.
Making the range large leads to lower coverage in smaller numbers.

The PBT tools whose implementation I've had a look at so far, do all cut off the range at some value. They just determine the value with different formulas. I haven't found a compelling rationale for any of these formulas yet.

Do you have a suggestion for a formula that's objectively better than (tries/2 - 3)? To be frank, I've already forgotten why added the "-3"...

What I think would be a real improvement, is to choose a different probability distribution. But that is complicated enough to think twice if the effort will be really worth it.

rkraneis commented 6 years ago

Yes, I fully agree that this is not an easy topic (which is why I also proposed just documenting it). Naïvely I would think that a logarithmic coverage and explicitly including <number>.MIN_VALUE, -1, 0, 1 and <number>.MAX_VALUE might give better all purpose coverage. But you are right, the other (I looked at VavrTest, JunitQuickcheck, QuickTheories and scalacheck) frameworks use a linear distribution between <number>.MIN_VALUE and <number>.MAX_VALUE. Only scalacheck also includes -1, 0 and 1. FWIW my motivation for a logarithmic coverage would be Benford's Law. But that might as well just be opinion ...

jlink commented 6 years ago

FWIW: Integral numbers already include 0, 1, -1, MIN, MAX explicitly. Decimal/floating point numbers also a few more. You don't see them in your example b/c you filter them out.

A logarithmic coverage wouldn't have succeeded in falsifying your property either, would it? Depending on the log base that is...

rkraneis commented 6 years ago

Good catch, I did not follow the code completely through :-). The drop off of the logarithmic distribution is actually already much too high (log2(Long.MAX_VALUE)=64 XD). A linear (uniform) distribution would have caught it, as shown in the initial question.

rkraneis commented 6 years ago

And I have to correct myself regarding scalacheck, as this actually seems to be biased towards the lower end when given explicit bounds.

jlink commented 6 years ago

I chose a different path. Random number generation now uses the whole domain between allowed min and max values. However, the full domain is divided into several partitions so that lower numbers are generated with a higher probability. The cut off point is determined by Math.max(tries / 2, 10).

@rkraneis I tried with your original example and evenNumbersAreEvenAndSmall is now successfully falsified.

jlink commented 6 years ago

Change available in version 0.8.5-SNAPSHOT

jqwik-team / jqwik

Poor numeric domain coverage when using default arbitraries #10

Testing Problem

Suggested Solution

Discussion