Closed lmartelli closed 6 months ago
First impulse: Changing the existing default behaviour would make a lot of existing properties much weaker, without people being aware of that.
You're probably aware that StringArbitrary.ascii()
configures string generators to only use those.
Maybe what's missing is an @AsciiChars
constraint annotation to make it almost frictionless to start with ascii?
Would a @Chars(regexp = "")
or something along those lines be possible?
Regexes for string generation has been on the list for a long time: https://github.com/jlink/jqwik/issues/68. I haven’t had a use case for it myself, and implementation is not trivial, so priority has been low.
My point is not to only use ASCII, but to change the random distribution of chars so that ASCII chars are about as likely to appear as the rest. I wouldn't mind a global configuration option if that would break things for others.
@lmartelli Did you try solving it with a provider? I think you should be able to meet your requirements with a custom @Provide
'r and using an arbitrary with a custom .withDistribution()
.
@lmartelli Did you try solving it with a provider? I think you should be able to meet your requirements with a custom
@Provide
'r and using an arbitrary with a custom.withDistribution()
.
I guess Arbitraries.frequencyOf(..)
is easier than muddling with distributions, e.g.
Arbitraries.frequencyOf(
Tuple.of(1, Arbitraries.strings()),
Tuple.of(3, Arbitraries.strings().ascii()
);
That could be a solution.
Closing since the above suggestion seems to solve the problem. @lmartelli Feel free to re-open.
Testing Problem
Arbitrary strings, by default, generate mostly Asian characters, because they are the most numerous, and the probably distribution for choosing a random character is even.
Suggested Solution
Given the history of character encoding on computers, I think It would be a better default emphasize the ASCII charset, so that an arbitrary string has more chances to contain ASCII chars, and you would not have to try 10K times in order to have a chance to get an arbitrary string that contains an ASCII char. Maybe the all ASCII chars could be considered an edge case of chars ?
Discussion
Discuss advantages and disadvantages of your solution. Compare it to alternative suggestions if there are any.