Closed Zac-HD closed 1 month ago
So, the plugin will dump CrossHair's internal debugging if I set DEBUG_CROSSHAIR=1
in the environment, and that already has most (but not all) of what we'd get from this integration. Coverage data, in particular, would be a really nice addition.
Even if I was a user interested in crosshair+hypothesis, I'm not sure it's practical without this kind of observability. Otherwise, I really have no clue whether I should be running any given test under CrossHair or not.
So, I'm very much +1. Some early thoughts:
status_reason
s that we could provide; distinguishing exceptions like UnknownSatisfiability
vs PathTimeout
vs CrossHairUnsupported
are all important for understanding what to investigate next.event()
arguments; we just need to figure out what our stance is on mutating values.Sweet - let's do the MVP thing for now: you provide a new method (see below), and I'll hook it up to our observability output 🙂 If collecting the data is at all expensive, you can make it conditional on if TESTCASE_CALLBACKS:
, that being a list defined in hypothesis.internal.observability
.
https://github.com/HypothesisWorks/hypothesis/pull/4083 is merged! New methods for you to (optionally) implement:
class _BackendInfoMsg(TypedDict):
type: Literal["info", "alert", "error"]
title: str
content: Union[str, Dict[str, Any]]
class PrimitiveProvider:
...
def observe_test_case(self) -> Dict[str, Any]:
"""Called at the end of the test case when observability mode is active.
The return value should be a non-symbolic json-encodable dictionary,
and will be included as `observation["metadata"]["backend"]`.
"""
return {}
def observe_information_messages(self, *, lifetime) -> Iterable[_BackendInfoMsg]:
"""Called at the end of each test case and again at end of the test function.
Return an iterable of `{type: info/alert/error, title: str, content: str|dict}`
dictionaries to be delivered as individual information messages.
(Hypothesis adds the `run_start` timestamp and `property` name for you.)
"""
assert lifetime in ("test_case", "test_function")
yield from []
This looks great. My next round of releases will include some useful output here.
I think this is done! If we want to track possible additional information, let's make that a separate issue?
Agree!
Hypothesis' observability features were originally designed for the benefit of users (https://github.com/HypothesisWorks/hypothesis/pull/3797), but have also turned out to be useful for developers - e.g. this discussion led to bugfixes and performance improvements.
Recent work on https://github.com/HypothesisWorks/hypothesis/pull/4034 has me thinking that observability information from Crosshair would actually be pretty valuable for us as maintainers, to answer questions like:
(if it's all of them, are we adding much value? what metrics would answer that?)
Practically speaking, what do we actually need to do here?
decide what to measure (schema here)
status_reason
, if something on the crosshair side was responsible for aStatus.INVALID
result, as if forassume(False)
. I considered whether we need custom statuses for e.g. #21 and concluded that thestatus_reason
metadata was a better fit.features
; they're intended to be about the runtime behavior of the code under test.event()
is currently disabled under Crosshair to reduce premature realization, but we could support it nicely by e.g. deferring it to the end of the test case where we realize anyway.timing
observations are meant to be disjoint, so that the sum is total runtime. We therefore probably want to put internal timings in themetadata
instead.metadata
: the catch-all put-anything-here section.connect it up
PrimitiveProvider
protocol and call it hereuse the new information to understand what's going on, improve hypothesis-crosshair, iterate on observability, etc.
Would you find this useful? If so, setting up metadata passthrough would be pretty easy 🙂