Add support for checking hypothesis tests

pschanely commented 3 years ago

You know what would be amazing? Symbolically executing tests written for Hypothesis! For example:

@given(st.integers(min_value=0))
def test(x):
    assert x ** 2 >= x

# could be equivalent to

def test(x: int):
    """pre: x >= 0"""
    assert x ** 2 >= x

this is particularly interesting to me because (a) I'm a core dev of Hypothesis 😜, (b) Hypothesis tests have to precisely describe the domain of valid inputs, and (c) they have a high density of assertions about error-prone behaviour!

I don't have time to write this myself (at least in the next few months), but would be delighted to help anyone who wants to have a go.

Originally posted by @Zac-HD in https://github.com/pschanely/CrossHair/issues/21#issuecomment-597468827

Zac-HD commented 3 years ago

See also this set of notes on the topic; and FYI I have a prospective Honours student looking at working on this 😁

Zac-HD commented 3 years ago

@JinghanCode is looking at this now, with the work-in-progress over at https://github.com/JinghanCode/CrossHair/pull/1 :tada:

Overall plan: start by translating a subset of Hypothesis strategies into CrossHair constraints, in order to get a baseline for the more general "choice sequence"-based approach (see notes, above). It's unclear at this stage whether we'd want to ship this as part of Hypothesis or CrossHair - it's pretty sensitive to unstable/internal implementation details, and there are many many tests it can't handle - but we do expect to learn something useful :grin:

pschanely commented 3 years ago

This so cool. Let me know how I can help. Starting out with it integrated on the CrossHair side makes sense to me, as it's far more experimental anyway. CrossHair will never detect 100% of cases, so even if we only handle basic cases, I'd be inclined to merge this kind of work :)

Zac-HD commented 3 years ago

This so cool. Let me know how I can help.

Will do! That's likely to consist of me pinging you for code review + advice on how to evaluate it in a few weeks-months, once the basics are working.

Starting out with it integrated on the CrossHair side makes sense to me, as it's far more experimental anyway. CrossHair will never detect 100% of cases, so even if we only handle basic cases, I'd be inclined to merge this kind of work :)

My hesitation is because it's depending on the details of actually-for-real unstable implementation details of Hypothesis, which would make for a poor user experience over time. But I'm also keen to ship handling for the cases we can handle somewhere, and with your support a hypothesis[crosshair] extra could be pretty cool - right up to "we proved this always passes, skipping input generation" 😁

pschanely commented 3 years ago

That's likely to consist of me pinging you for code review + advice on how to evaluate it in a few weeks-months, once the basics are working.

Great; reach out anytime, even if it's just for some help along the way. I've also set up regular CrossHair discussion time that's open to the public, here, on calendly. (but don't feel constrained to those days/hours)

My hesitation is because it's depending on the details of actually-for-real unstable implementation details of Hypothesis, which would make for a poor user experience over time. But I'm also keen to ship handling for the cases we can handle somewhere, and with your support a hypothesis[crosshair] extra could be pretty cool - right up to "we proved this always passes, skipping input generation" 😁

Got it. Yeah, it would be amazing to see this as a hypothesis extra. FWIW, if we do end up deciding to put this on the CrossHair side, I'm happy to make sure CrossHair stays compatible.

It makes sense to think about this as an experiment for now, but eventually it might be worth thinking about structuring the bridge logic in a way that is imaginable as part of the strategy interface. For instance, CrossHair would be perfectly happy with an ability to query arbitrary strategies for (1) a (possibly parameterized) type and (2) a post-processing function for mapping/filtering. (though 'm sure my mental model of a strategy is overly naive!)

pschanely commented 3 years ago

@JinghanCode and @Zac-HD : how did this investigation go? Looks like you got somewhere with it, and I am very curious about your overall findings!

Zac-HD commented 3 years ago

https://github.com/JinghanCode/Crosshair-Hypothesis/pull/1/files shows how far we got: based on some runtime introspection, we can analyse @given(...) tests with st.integers(...) - as well as one_of(), .map(), and .filter()!

The challenge for future work is to investigate how this scales out to cover more strategies (e.g. st.lists()), or to try plugging in at the lower-level conjecture (choice-sequence) interface as we discussed here.

pschanely commented 3 years ago

Note: a very basic and slow integration now exists (see this post about it)! Try it out with crosshair watch --analysis_kind=hypothesis <file.py>

The next steps to make this more effective include some refactoring on the hypothesis side; this issue in particular.

pschanely commented 2 years ago

Now when CrossHair finds a hypothesis counterexample, it's saved to the hypothesis database. That means even regular runs of hypothesis will try inputs that CrossHair has found! (via #185; thank you @Zac-HD !!)

pschanely commented 3 months ago

For anyone bumping into this ticket:

Hypothesis now supports running CrossHair as an optional backend via this plugin!

This integration is much more effective at bug-finding than running crosshair with the --analysis_kind=hypothesis option. (and we might deprecate this option in preference to the plugin at some point in the future)

pschanely / CrossHair

Add support for checking hypothesis tests #45