Open sobolevn opened 1 year ago
Right now I've ported and modified several binascii
tests from https://github.com/Zac-HD/stdlib-property-tests/blob/master/tests/test_encode_decode.py :)
I'd suggest looking at PyPy's property-based tests, since they're known to be useful for developers of a Python implementation.
I'd also reconsider (3). While complicated strategies are obviously more work to develop and use, in my experience they're also disproportionately likely to find bugs - precisely because code working with complicated data structures are even more difficult to test without Hypothesis, and so there are usually weird edge cases.
Examples of this effect include https://github.com/python/cpython/issues/84838, https://github.com/python/cpython/issues/86384, https://github.com/python/cpython/issues/89901, and https://github.com/python/cpython/issues/83134. The hypothesmith
source-code strategies are at a pre-alpha / proof-of-concept stage and have more third-party dependencies so I'm not sure they'd be a good fit for CPython at this time, but you get the idea. I don't have capacity to implement it myself, but could supervise a volunteer or contractor to build a production-ready strategy if there's interest.
Personally, I don't think we should go down this path. Hypothesis has a certain amount of randomness to it. In general, our tests should be specifically designed to cover every path of interest.
When someone is designing new code, it is perfectly reasonable to use Hypothesis. However, for existing code, we "let's use Hypothesis" isn't a goal directed as a specific problem. One issue that we've had with people pursuing a "let's use Hypothesis" goal is that they are rarely working on modules that they fully understand, that the invariants they mentally invent aren't the actual invariants and reflect bugs in their own understanding rather than actual bugs in the code. For example, we got reports on colorsys
conversions not being exactly invertible; however, due to color gamut limitations they can't always be inverted (information is lost) and the conversion tables were dictated by published standards that were invertible. We also got false report on the random module methods by people who just let Hypothesis plug in extreme values without any thought of what the method was actually supposed to do in real use cases. There may have been one actual (but very minor) bug in the standard library found by Hypothesis, but everything else was just noise.
It would be perfectly reasonable to use Hypothesis outside of our test suite and then report an actual bug if found. Otherwise, I think actually checking in the h-tests would just garbage-up our test suite, make it run less deterministically, and no explicitly write-out all the cases being covered.
Hello, I am interested in learning more about this and add to proposals soon too, but I want to make sure I am looking at same thing. I search https://www.google.com/search?q=python+hypothesis and it show https://hypothesis.readthedocs.io/en/latest/ and https://pypi.org/project/hypothesis/, is this correct?
Also I am trying to understand this conversation. Based on my understanding, software should be made to be ideally easy maintainable and testable, and part of this includes making tests that always find problems, no? But based on conversation, it looks to me that this hypothesis
tests is a bit random sometimes and will usually catch the bugs, but not always? If so, why not catch the bugs always instead of only catch the bugs sometimes?
Also, to clarify:
Tests should be suitable for property-based nature of hypothesis
This "property-based" is refer to the same as https://docs.python.org/3/library/functions.html#property?
Anyways, one proposal I have is to make some code to make all combinations of valid inputs and test them to make sure the code is work correctly. I think we can also do permutations of valid inputs, but I'm not too sure if this will add extra usefulness (and also I think tests will become slower since number of permutations will be usually much larger than combinations).
@brandonardenwalli I really appreciate your efforts to learn and contribute to CPython, but I would recommend starting with a different issue: because this one is very general and is quite complex. Not only because of the technical side, but also because of the related history and because different people have different opinions about it :)
Here's the list of issues you can look at: https://github.com/python/cpython/issues?q=is%3Aissue+is%3Aopen+label%3Aeasy
https://github.com/python/cpython/issues?q=is%3Aissue+is%3Aopen+label%3Aeasy
Wow, this is really helpful information! Thanks for this link!
I was just thinking that there should be some way to organize the issues so it is easier to go through, but I did not try this yet. I will look to this!
@rhettinger
It would be perfectly reasonable to use Hypothesis outside of our test suite and then report an actual bug if found. Otherwise, I think actually checking in the h-tests would just garbage-up our test suite, make it run less deterministically, and no explicitly write-out all the cases being covered.
But, we already have Hypothesis as a part of your test suite and CI. It already has some tests to it. Link: https://github.com/python/cpython/blob/5d936b64796261373429c86cdf90b1d8d8acefba/.github/workflows/build.yml#L361-L467 I propose adding more cases, where it makes sense.
In general, our tests should be specifically designed to cover every path of interest.
I agree that our regular tests should cover all paths, but there are more to it. Path coverage is only as good as 100%. But, we are obviously limited by the number of data we can provide. We cannot come up with lots of data, our current test suites proove my point.
But, Hypothesis can. This is exactly what it is good at: proving lots of correctly structured data.
Hypothesis has a certain amount of randomness to it
There are different way on how we control the randomness. First, we use a database with examples: https://github.com/python/cpython/blob/5d936b64796261373429c86cdf90b1d8d8acefba/Lib/test/support/hypothesis_helper.py#L22-L35 Plus, Hypothesis itself controls how data is generated to be repeatable.
One issue that we've had with people pursuing a "let's use Hypothesis" goal is that they are rarely working on modules that they fully understand, that the invariants they mentally invent aren't the actual invariants and reflect bugs in their own understanding rather than actual bugs in the code
This is a valid concern, I hope that collaboration among developers can solve this. And this is exactly why I created this issue: to figure out what is worth testing and what not.
There may have been one actual (but very minor) bug in the standard library found by Hypothesis, but everything else was just noise.
I don't think that this is actually correct. First, we cannot know what bugs were found by Hypothesis, because people might just post the simplest repro without naming the tool.
Second, we have these bugs that do mention Hypothesis:
btw, I'm new to Hypothesis and the test_zoneinfo
from @pganssle has lots of different use cases, the location is
https://github.com/pganssle/zoneinfo/blob/master/tests/test_zoneinfo_property.py
Edit: Thanks for all your kind warnings, I didn't actually try to use it. I just stared into the abyss and it stared back at me. It's fascinating in an academic sense. The way that some people read about https://en.wikipedia.org/wiki/Australian_funnel-web_spider
I do think we should be expanding hypothesis tests. There was no stipulation in the original acceptance that they would only be used for zoneinfo, the idea was to use property tests more widely.
We should at least start by migrating whichever tests in Zac's repo still make sense (probably most of them).
Since they don't run on all CI, they should always come with @example
s, in which case they are essentially parametrized tests (the stubs run the examples).
@encukou and I have been working on adding some more tests + adding a devguide section about Hypothesis. https://github.com/python/cpython/pull/119345 will also be nice to have.
We now have both the code and CI job to run
hypothesis
tests.In this issue I invite everyone to propose ideas: what can we test with it? Criteria:
hypothesis
hypothesis
will produce a lot of cases to tests, we cannot do any slow / network related things)What else?
Known applications
Property-based tests are great when there are certain patterns:
decode
andencode
dumps
andloads
There are also some hidden patterns as (example):
'a' in s
thens.index('a')
must return an integerExisting work
Good examples of exising stuff:
hypothesis
tests intest_zoneinfo
by @pganssleLinked PRs