jqwik-team / jqwik

Property-Based Testing on the JUnit Platform
http://jqwik.net
Eclipse Public License 2.0
578 stars 64 forks source link

Non uniform character distribution #416

Closed lmartelli closed 6 months ago

lmartelli commented 2 years ago

Testing Problem

Arbitrary strings, by default, generate mostly Asian characters, because they are the most numerous, and the probably distribution for choosing a random character is even.

Suggested Solution

Given the history of character encoding on computers, I think It would be a better default emphasize the ASCII charset, so that an arbitrary string has more chances to contain ASCII chars, and you would not have to try 10K times in order to have a chance to get an arbitrary string that contains an ASCII char. Maybe the all ASCII chars could be considered an edge case of chars ?

Discussion

Discuss advantages and disadvantages of your solution. Compare it to alternative suggestions if there are any.

jlink commented 2 years ago

First impulse: Changing the existing default behaviour would make a lot of existing properties much weaker, without people being aware of that.

You're probably aware that StringArbitrary.ascii() configures string generators to only use those.

Maybe what's missing is an @AsciiChars constraint annotation to make it almost frictionless to start with ascii?

adam-waldenberg commented 2 years ago

Would a @Chars(regexp = "") or something along those lines be possible?

jlink commented 2 years ago

Regexes for string generation has been on the list for a long time: https://github.com/jlink/jqwik/issues/68. I haven’t had a use case for it myself, and implementation is not trivial, so priority has been low.

lmartelli commented 2 years ago

My point is not to only use ASCII, but to change the random distribution of chars so that ASCII chars are about as likely to appear as the rest. I wouldn't mind a global configuration option if that would break things for others.

adam-waldenberg commented 2 years ago

@lmartelli Did you try solving it with a provider? I think you should be able to meet your requirements with a custom @Provide'r and using an arbitrary with a custom .withDistribution().

jlink commented 2 years ago

@lmartelli Did you try solving it with a provider? I think you should be able to meet your requirements with a custom @Provide'r and using an arbitrary with a custom .withDistribution().

I guess Arbitraries.frequencyOf(..) is easier than muddling with distributions, e.g.

Arbitraries.frequencyOf(
 Tuple.of(1, Arbitraries.strings()),
 Tuple.of(3, Arbitraries.strings().ascii()
);
lmartelli commented 2 years ago

That could be a solution.

jlink commented 6 months ago

Closing since the above suggestion seems to solve the problem. @lmartelli Feel free to re-open.