Open pschanely opened 4 weeks ago
@Zac-HD Done with one round of investigations and fixes! (hypothesis-crosshair 0.0.13 and crosshair-tool 0.0.68) Breaking up the tests into categories:
minimize()
tests - should we never be running these with CrossHair? I see we try to avoid running it here, although some of the tests that are timing out appear to have _current_build_context.value == None
under the crosshair testing profile. Affected tests:
test_can_generate_both_zeros_when_in_interval
test_mutual_recursion
test_hashable_type_unhashable_value
tests/nocover/test_simple_numbers.py
Nested tests (used to time out, but now CrossHair crashes fast):
test_nesting_with_control_passes_health_check
Just plain better, due to recent updates:
test_drawing_many_near_boundary
(though still slowish)test_lookup_knows_about_all_core_strategies
test_fullmatch_generates_example
This gets a solver unsat here, ideally this will raise hypothesis.errors.BackendCannotProceed
once that's available ... and would keep trying.
test_hard_to_find_single_example
For me to continue to investigate:
tests/nocover/test_health_checks.py
- not sure I'll be able to get too_large_strategy
to work, but will poke at it a little. The other tests complete after ~minutes.tests/nocover/test_type_lookup_forward_ref.py
no longer times out, but now yields a strange SimpleDict object is not callable
error.Sweet! I'll update shortly and see how it goes 😁
minimize()
tests - should we never be running these with CrossHair? I see we try to avoid running it here, although some of the tests that are timing out appear to have_current_build_context.value == None
under the crosshair testing profile.
That skip is specific to running minimal(...)
_inside @given()
, i.e. instead of crashing with the nested-tests error we just skip that test. On investigation there's a very obvious guess as to why it's super slow, too: minimal()
has to find a counterexample before it can shrink that, and if Crosshair happens to not find one we can spend an awfully long time going through the max_examples=50_000
(!!!) iterations.
My action items:
Why.undiscovered
to investigate laterhypothesis.errors.BackendCannotProceed
?That skip is specific to running
minimal(...)
_inside@given()
, i.e. instead of crashing with the nested-tests error we just skip that test. On investigation there's a very obvious guess as to why it's super slow, too:minimal()
has to find a counterexample before it can shrink that, and if Crosshair happens to not find one we can spend an awfully long time going through themax_examples=50_000
(!!!) iterations.
Ah! I get it now. So I'll continue to look at those too.
Very hacky, but... done. Well, maybe, I'll see what CI thinks after I sleep 😄
OK, there are quite a few failures but I did mess around with the internals a bit and overall it's looking like an improvement. Quick triage:
minimal()
etc https://github.com/HypothesisWorks/hypothesis/pull/4090BackendCannotProceed
changes to a separate PRTypeError: unhashable type: 'tzutc'
('Rule'
, etc) regressions, failing on Crosshair backend but not HypothesisUnsatisfiable: Could not find any examples from floats() that satisfied lambda x: x + 1 == x and not math.isinf(x)
LazyStrategy.__repr__
due to if k not in sig.parameters or v is not sig.parameters[k].default
?I got sidetracked on a few subtle-but-important issues this week, but this is still a big priority! I'll be looking at these in the next few days.
0.0.70 fixes that ugly regression with unhashables!
The following two issues are going to take me a while, but, honestly, might legitimately be the next things to look at.
- [ ] (Phillip)
Unsatisfiable: Could not find any examples from floats() that satisfied lambda x: x + 1 == x and not math.isinf(x)
I'll need to tackle the issue of real float semantics for this one. High value, ~high effort.
- [ ] (???) strategy reprs fail to omit default values under crosshair, e.g. here, unclear where that should be fixed. Might be me in
LazyStrategy.__repr__
due toif k not in sig.parameters or v is not sig.parameters[k].default
?
Your guess is right - the failing int bound is a symbolic. I have a branch on the backburner for more accurate identity semantics. I hadn't thought about faithful integer intern'ing as part of that (the biggest wins relate to avoiding path forks on enums and optional values), but I think I could make it happen.
As for whether the identity check on the hypothesis side is objectively correct, I'm unsure. Integer intern'ing isn't guaranteed across implementations. But in order for your test to fail, I think integer intern'ing would have to switch up mid-execution, which seems pretty unlikely.
For the identity check, I'm not (deliberately) checking interned integers; but rather whether the value is the default argument - so that we show even the default value if it was explictly passed. Integers usually defeat that, but after a quick investigation I've classified this as "not expected to pass under crosshair" so we're done.
We're actually running low on important semantic issues, so I'd probably turn next to performance issues - there are some very slow tests that I've just skipped because they take several minutes, or hours, or don't finish at all within the 6h Actions time limit. Current commit with those is https://github.com/HypothesisWorks/hypothesis/pull/4034/commits/bb432101541c618762b1024b10e277658105baf8
Originally posted by @Zac-HD in https://github.com/pschanely/CrossHair/issues/292#issuecomment-2289529225