jqwik-team / jqwik

Property-Based Testing on the JUnit Platform
http://jqwik.net
Eclipse Public License 2.0
576 stars 64 forks source link

Repeatead chars not working as expected #506

Closed esfomeado closed 1 year ago

esfomeado commented 1 year ago

Testing Problem

I had an issue where jqwik is generating duplicated values frequently which, for my example should almost never occur.

I have noticed that these duplicated values are always the same character e,g, AAAAAA so I decided to set the repeatedChars to 0 but this type of values are still generated. I have tried the other values but seems that there is no impact on the ammount of time a repeated char is used.

    @Test
    void bug() {
        StringArbitrary strings = Arbitraries.strings()
                .alpha()
                .ofMinLength(5)
                .ofMaxLength(25)
                .repeatChars(0);

        List<String> names = IntStream.range(0, 100)
                .mapToObj(i -> strings.sample())
                .toList();

        Map<String, Long> duplicateCount = names.stream()
                .filter(Objects::nonNull)
                .collect(Collectors.groupingBy(s -> s, Collectors.counting()));

        List<String> duplicateStrings = duplicateCount.entrySet().stream()
                .filter(entry -> entry.getValue() > 1)
                .map(Map.Entry::getKey)
                .collect(Collectors.toList());

        System.out.println("Duplicate strings: " + duplicateStrings);

        assertThat(duplicateStrings).isEmpty();
    }
jlink commented 1 year ago

@esfomeado Thanks for raising that issue. I see why you‘d expect that repeatChars(0) to not create any duplicate chars at all. Currently it just sets the probability of explicitly adding duplicate chars to 0. The effect you see is due to edge case resolution.

I’ll have to think about if changing the behaviour or documentation of repeatChars is the right thing to do. Maybe an additional uniqueChars() would be even clearer.

To solve your problem short-term, you can use [withoutEdgeCases()](https://jqwik.net/docs/1.7.4/javadoc/net/jqwik/api/Arbitrary.html#withoutEdgeCases()) on the strings arbitrary:

Arbitraries.strings().alpha()
           .ofMinLength(5).ofMaxLength(25)
           .repeatChars(0.0)
           .withoutEdgeCases();

This will prevent „AAAAA“ to be generated on purpose. Duplicate chars will still occur by chance though.

If you definitely want no duplicates, you’d currently have to take a detour through something like:

Arbitraries.chars().alpha().list()
           .ofMinSize(5).ofMaxSize(25)
           .uniqueElements()
           .map(chars -> chars.stream().map(String::valueOf).collect(Collectors.joining()));
jlink commented 1 year ago

I decided to introduce an experimental new API method: StringArbitrary.uniqueChars()

The open question is if StringArbitrary.repeatChars(0.0) should log a warning or just call uniqueChars implicitly, or both?

@esfomeado What do you think?

esfomeado commented 1 year ago

Thanks for the quick response.

@jlink Personally I would expect to have the same behaviour was uniqueChars when set to 0 so I would say to just call uniqueChars implicitly.

jlink commented 1 year ago

The StringArbitrary.uniqueChars() functionality and the fix for repeatChars(0.0) is available in 1.8.0-SNAPSHOT.

jlink commented 1 year ago

In addition, there's now an annotation @UniqueChars that can be applied to any String parameter in a property.

jlink commented 1 year ago

New features available in 1.8.0-SNAPSHOT