whatyouhide / stream_data

Data generation and property-based testing for Elixir. 🔮
https://hexdocs.pm/stream_data
880 stars 66 forks source link

Specify fixed examples that should always be run #22

Closed tmbb closed 7 years ago

tmbb commented 7 years ago

From this discussion: https://github.com/whatyouhide/stream_data/issues/18#issuecomment-322045634

When testing a function on integers, I want to test it on -1, 0 and 1, values that often break the developers' expectations.

When testing a function on binaries, I'd want to test it on the empty binary "". The same for the empty list, the empty tuple and the empty map.

Currently, these values are "likely" to be generated, but I think there should be stronger guarantees (they should be always tested).

josevalim commented 7 years ago

I am a bit torn on this because it partially defeats the purpose of random testing. Does any of the other property testing libraries do this? It is important to learn from others but I don't believe we should base all decisions on a single library. --

José Valim www.plataformatec.com.br Skype: jv.ptec Founder and Director of R&D

tmbb commented 7 years ago

I thought I had read that Hypothesis (python) did this, but I can't find it in the docs and I can't find it in the source code either... Maybe they've changed something or maybe I'm just confusing it with something else. I'm sorry.

For floats they do have a list of NASTY_FLOATS which is given a higher weight, but they don't guarantee that those floats will appear: https://github.com/HypothesisWorks/hypothesis-python/blob/master/src/hypothesis/searchstrategy/numbers.py#L92

I've read that part of the source a long time ago, and maybe I'm just confusing this with some kind of deterministic guarantee.

They probably trust the shrinking system to find these edge cases, maybe? I'll ask the author.

If no one else seems to be doing this, then neither should StreamData.

tmbb commented 7 years ago

I've asked the author and he said Hypothesis has never done such a thing. I'm extremely embarrassed for having made you think it did. The generators in hypothesis are tweaked so that problematic values appear more often, but they never make sure that a value will appear in a given test run.

In the case of floats, they're biased because of the NASTY_FLOAT list, for example. In the case of integers, they seem to be biased toward zero because they're drawn from a geometric distribution. Lists are also biased toward zero because they too are built according to a geometric distribution. So everything is biased as needed but never deterministic. That might have been my confusion.

tmbb commented 7 years ago

Currently floats and integers in StreamData are drawn from uniform distributions, aren't they? I wonder of they could use some bias.

I can always define my own (arbitrarily biased) generators, so this is really a question about what the default should be.

It's always hard to decide which should be the proper distribution of random data with which to test arbitrary programs, and I don't really have an evidence-based answer for this :/

whatyouhide commented 7 years ago

@tmbb let's leave floats out of the discussion since we're discussing changes to them but as far as integers go, they are drawn from a uniform distribution but you are not taking into account the generation size which is very important. When starting the test run we start with a small size (1) and increase the size by 1 on each iteration. Most generators use the size to determine the size of the space they generate values out of, so for int/0, the size determines the range -size..size where integers are generated out of. This means that for the first runs, it's very likely that one of 1, 0, or -1 are generated.

tmbb commented 7 years ago

you are not taking into account the generation size which is very important

Hm... This is certainly enough bias for integers (maybe even too much?). I was indeed looking at the generator in isolation and not taking into account the fact that the generator size would grow with time.

The probability that the number k is generated at least once will be (product replaced by sum of logs):

def p(k, n) when abs(k) <= n, do:
  1 - :math.exp(Enum.sum(for i <- max(abs(k), 1)..n, do: -:math.log(1 + 1/(2*i))))
def p(_, _), do: 0.0

The formula above was derived on the keyboard on the go (I have no paper at hand right now), so it might have mistakes, or maybe it can be simplified further.

It generates 0, 1 and -1 at least once with probability > 0.91 each for 100 trials, which is acceptable. This generator might even cluster things around the origin too much. Note that it outright forbids large numbers from being generated). Similar considerations might apply to lists.

Hm... I'll draw some plots with matplotlib to get some intuition on the shape of the distributions and share them when they're ready.

tmbb commented 7 years ago

Please correct me if I'm wrong, but:

  1. The default maximum of runs is 100
  2. The generator size increments by 1 for each run
  3. Generated integers (by default) belong to size..size

Doesn't this mean that by default no integers > 100 will be generated? Is this the intended behavior? Shouldn't we generate a couple of very large integers (say, > 216) once in a while?

whatyouhide commented 7 years ago

@tmbb yes it is correct, we never generated outside of size so outside of 100 by default. Yes it is the intended behaviour. I'm not sure we should generate huge integers in int/0 because int/0 is used and is meant to be used as basis for many other generators. For example, you could bind on int/0 to get the length of something you need to generated, and then generated that something. Or you could do bind(int(), fn i -> String.duplicate(" ", i) end). Using 2^16 for these cases would likely be a problem. I think a good alternative could be large_int() or something like this, which generates integers in a huge space instead of -size..size. I think it's worth noting that in the BEAM I don't think it's too common to have different behaviours given the size of the integers given integers are not fixed-memory, so there are no upper/lower limits on what you can generated and by consequence functions that work with integers tend to just work with all integers (the arithmetic functions come to mind here).

tmbb commented 7 years ago

Yes, the BEAM is "weird" regarding numeric types, at least when comparing to other more common runtimes. I think there might be a role for large integers in testing. Say I implemented ranges (n..m) by actually building the list when all I wanted was to test whether an integer belongs to the range. Trying huge ints might come up with something (especially if you measure run time or memory consumption somehow during testing).

So there should definetely be a large_int() generator (with a distribution that makes sense). The question is: which one should be the default? The small one or the big one? I thnk it should be the big one (if your integer is meant to build lists, then document that fact by explicitly using the small one!), but I don't really have syrong opinions on this.

josevalim commented 7 years ago

A large_int generators makes sense. I agree with Andrea generating large ones by default can be expensive on their use. --

José Valimwww.plataformatec.com.br http://www.plataformatec.com.br/Founder and Director of R&D

tmbb commented 7 years ago

@josevalim I don't understand. What do you mean by "generating large ones by default"? I'm not proposing using large integers to build data structures. What I've tried to say is:

  1. The current int() generator should be renamed to something like small_int() so that it's clear that it should be used for data structures. I wonder if it should be renamed to size() instead. That would make it pretty obvious that if you want to generate something of a certain size, oyu sohuld use this one.
  2. We should write a new generator, named int(), which would generate both small and big integers and make it clear in te docs that this sohuld not be used as a size for generators. This int sohuld be used for arithmetic operations and similar functions.

Where we're currently using int() we would use small_int/size instead. By default, we would use the small integer generator, only under a different name.

What do you think?

josevalim commented 7 years ago

We are in agreement. I am not sure though which default is better. INT and BIGINT or INT and SMALLINT.

--

José Valimwww.plataformatec.com.br http://www.plataformatec.com.br/Founder and Director of R&D

mkaszubowski commented 7 years ago

@josevalim Quixir by Dave Thomas does something similar. You can pass must_have option to the generator, but there often are some defaults: https://github.com/pragdave/quixir#list-of-type-generators

Personally, I think it's quite a good idea. While you want most of the values to be random, some values might be considered a bit "special" because, as @tmbb said, they often are the values that break the code (an empty list being a great example for a function that uses hd(list)).

whatyouhide commented 7 years ago

@tmbb How would this look like API-wise? Consider this (convoluted on purpose) example:

check all list <- list_of({integer(), integer()}, min_length: 1),
          {a, b} <- member_of(list) do
  assert {a, b} in list
end

Where would you specify fixed examples? Would you specify fixed examples for list, a, and b? We don't store what variables are generated so that would be hard. Would you specify fixed examples for list and then have member_of(list) still be randomized?

Looking at this from another side, wouldn't it be simpler to write a normal tests for examples that you know are fixed? For example:

# For non-empty lists
check all list <- list_of(integer(), min_length: 1),
          elem <- member_of(list) do
  assert elem in list
end

# For empty lists
check all elem <- integer() do
  refute elem in []
end

Thoughts?

tmbb commented 7 years ago

@whatyouhide I agree this isn't worth it. The probabilistic guarantees are enough.

whatyouhide commented 7 years ago

I'm gonna close this off for now then. :) Thanks for the discussion!