[Meta] Research: what can we test with Hypothesis?

sobolevn commented 1 year ago

We now have both the code and CI job to run hypothesis tests.

In this issue I invite everyone to propose ideas: what can we test with it? Criteria:

Tests should be suitable for property-based nature of hypothesis
The action under test should be fast and reliable (since hypothesis will produce a lot of cases to tests, we cannot do any slow / network related things)
Strategies for data generation should be rather simple (I think that later we can get to more complex strategies, but for now for the sake of simplicity and maintability I propose not to get too crazy with them)

What else?

Known applications

Property-based tests are great when there are certain patterns:

decode and encode
dumps and loads

There are also some hidden patterns as (example):

If 'a' in s then
s.index('a') must return an integer

Existing work

Good examples of exising stuff:

Existing hypothesis tests in test_zoneinfo by @pganssle
Existing tests in https://github.com/Zac-HD/stdlib-property-tests by @Zac-HD and others

Linked PRs

gh-107863
gh-119406

sobolevn commented 1 year ago

Right now I've ported and modified several binascii tests from https://github.com/Zac-HD/stdlib-property-tests/blob/master/tests/test_encode_decode.py :)

Zac-HD commented 1 year ago

I'd suggest looking at PyPy's property-based tests, since they're known to be useful for developers of a Python implementation.

I'd also reconsider (3). While complicated strategies are obviously more work to develop and use, in my experience they're also disproportionately likely to find bugs - precisely because code working with complicated data structures are even more difficult to test without Hypothesis, and so there are usually weird edge cases.

Examples of this effect include https://github.com/python/cpython/issues/84838, https://github.com/python/cpython/issues/86384, https://github.com/python/cpython/issues/89901, and https://github.com/python/cpython/issues/83134. The hypothesmith source-code strategies are at a pre-alpha / proof-of-concept stage and have more third-party dependencies so I'm not sure they'd be a good fit for CPython at this time, but you get the idea. I don't have capacity to implement it myself, but could supervise a volunteer or contractor to build a production-ready strategy if there's interest.

rhettinger commented 1 year ago

Personally, I don't think we should go down this path. Hypothesis has a certain amount of randomness to it. In general, our tests should be specifically designed to cover every path of interest.

When someone is designing new code, it is perfectly reasonable to use Hypothesis. However, for existing code, we "let's use Hypothesis" isn't a goal directed as a specific problem. One issue that we've had with people pursuing a "let's use Hypothesis" goal is that they are rarely working on modules that they fully understand, that the invariants they mentally invent aren't the actual invariants and reflect bugs in their own understanding rather than actual bugs in the code. For example, we got reports on colorsys conversions not being exactly invertible; however, due to color gamut limitations they can't always be inverted (information is lost) and the conversion tables were dictated by published standards that were invertible. We also got false report on the random module methods by people who just let Hypothesis plug in extreme values without any thought of what the method was actually supposed to do in real use cases. There may have been one actual (but very minor) bug in the standard library found by Hypothesis, but everything else was just noise.

It would be perfectly reasonable to use Hypothesis outside of our test suite and then report an actual bug if found. Otherwise, I think actually checking in the h-tests would just garbage-up our test suite, make it run less deterministically, and no explicitly write-out all the cases being covered.

brandonardenwalli commented 1 year ago

Hello, I am interested in learning more about this and add to proposals soon too, but I want to make sure I am looking at same thing. I search https://www.google.com/search?q=python+hypothesis and it show https://hypothesis.readthedocs.io/en/latest/ and https://pypi.org/project/hypothesis/, is this correct?

Also I am trying to understand this conversation. Based on my understanding, software should be made to be ideally easy maintainable and testable, and part of this includes making tests that always find problems, no? But based on conversation, it looks to me that this hypothesis tests is a bit random sometimes and will usually catch the bugs, but not always? If so, why not catch the bugs always instead of only catch the bugs sometimes?

Also, to clarify:

Tests should be suitable for property-based nature of hypothesis

This "property-based" is refer to the same as https://docs.python.org/3/library/functions.html#property?

Anyways, one proposal I have is to make some code to make all combinations of valid inputs and test them to make sure the code is work correctly. I think we can also do permutations of valid inputs, but I'm not too sure if this will add extra usefulness (and also I think tests will become slower since number of permutations will be usually much larger than combinations).

sobolevn commented 1 year ago

@brandonardenwalli I really appreciate your efforts to learn and contribute to CPython, but I would recommend starting with a different issue: because this one is very general and is quite complex. Not only because of the technical side, but also because of the related history and because different people have different opinions about it :)

Here's the list of issues you can look at: https://github.com/python/cpython/issues?q=is%3Aissue+is%3Aopen+label%3Aeasy

brandonardenwalli commented 1 year ago

https://github.com/python/cpython/issues?q=is%3Aissue+is%3Aopen+label%3Aeasy

Wow, this is really helpful information! Thanks for this link!

I was just thinking that there should be some way to organize the issues so it is easier to go through, but I did not try this yet. I will look to this!

sobolevn commented 1 year ago

@rhettinger

It would be perfectly reasonable to use Hypothesis outside of our test suite and then report an actual bug if found. Otherwise, I think actually checking in the h-tests would just garbage-up our test suite, make it run less deterministically, and no explicitly write-out all the cases being covered.

But, we already have Hypothesis as a part of your test suite and CI. It already has some tests to it. Link: https://github.com/python/cpython/blob/5d936b64796261373429c86cdf90b1d8d8acefba/.github/workflows/build.yml#L361-L467 I propose adding more cases, where it makes sense.

In general, our tests should be specifically designed to cover every path of interest.

I agree that our regular tests should cover all paths, but there are more to it. Path coverage is only as good as 100%. But, we are obviously limited by the number of data we can provide. We cannot come up with lots of data, our current test suites proove my point.

But, Hypothesis can. This is exactly what it is good at: proving lots of correctly structured data.

Hypothesis has a certain amount of randomness to it

There are different way on how we control the randomness. First, we use a database with examples: https://github.com/python/cpython/blob/5d936b64796261373429c86cdf90b1d8d8acefba/Lib/test/support/hypothesis_helper.py#L22-L35 Plus, Hypothesis itself controls how data is generated to be repeatable.

One issue that we've had with people pursuing a "let's use Hypothesis" goal is that they are rarely working on modules that they fully understand, that the invariants they mentally invent aren't the actual invariants and reflect bugs in their own understanding rather than actual bugs in the code

This is a valid concern, I hope that collaboration among developers can solve this. And this is exactly why I created this issue: to figure out what is worth testing and what not.

There may have been one actual (but very minor) bug in the standard library found by Hypothesis, but everything else was just noise.

I don't think that this is actually correct. First, we cannot know what bugs were found by Hypothesis, because people might just post the simplest repro without naming the tool.

Second, we have these bugs that do mention Hypothesis:

Refs https://github.com/python/cpython/issues/86275

blaisep commented 5 months ago

btw, I'm new to Hypothesis and the test_zoneinfo from @pganssle has lots of different use cases, the location is https://github.com/pganssle/zoneinfo/blob/master/tests/test_zoneinfo_property.py

Edit: Thanks for all your kind warnings, I didn't actually try to use it. I just stared into the abyss and it stared back at me. It's fascinating in an academic sense. The way that some people read about https://en.wikipedia.org/wiki/Australian_funnel-web_spider

pganssle commented 5 months ago

I do think we should be expanding hypothesis tests. There was no stipulation in the original acceptance that they would only be used for zoneinfo, the idea was to use property tests more widely.

We should at least start by migrating whichever tests in Zac's repo still make sense (probably most of them).

Since they don't run on all CI, they should always come with @examples, in which case they are essentially parametrized tests (the stubs run the examples).

Zac-HD commented 5 months ago

@encukou and I have been working on adding some more tests + adding a devguide section about Hypothesis. https://github.com/python/cpython/pull/119345 will also be nice to have.

python / cpython