Recast "samples" as conformance tests

kevinb9n commented 2 years ago

A checker that wants to check whether it conforms to our spec will want a test suite they can run. Our samples are already a great basis for that, but so far are being a little... coy about it.

This issue could mean we actually provide a fully runnable test suite (where they must provide a small plugin for actually invoking their analyzer), but it doesn't necessarily have to.

We could instead still expect them to provide their own harness; but we can just rename and redocument the current contents of /samples/ appropriately to look more like conformance tests (or a different term).

Either way, what does passing these tests validate?

First, the tests are not testing their entire tool, just whatever "library" within that tool that does the basic interpretation of JSpecify annotations and draws factual conclusions from them. Our tests would validate that that part of their tool is drawing those conclusions correctly. Can that library classify each situation appropriately as contradictory, redundant, unrecognized, type mismatch, etc.? Ideally it will also validate that the library isn't reporting additional conclusions not supported by our spec.

The tool itself would of course use that library, but it is still free to do whatever it wants with the information the library provides, and we are not testing that behavior.

To be sure, there are "bad things" a tool could do that we would not like, but which these tests would not catch. I think it's fine though; we are simply testing the tools's ability to read the information supplied by the annotations correctly, and afaik that's enough.

I suggest this is worth doing for 1.0 because it will make the whole process of getting tools on board with 1.0 easier.

Previous form: #178 <--- please rescue history from it you find useful

cpovirk commented 2 years ago

The latest evolution of my thinking has been along these lines:

It makes sense to view the samples as "some tests for a naive JSpecify-only nullness checker." That should make them useful as a starting point for tests for other tools, e.g.,

tools that read JSpecify annotations but aren't "nullness checkers"
nullness checkers that have sources of information other than JSpecify annotations
nullness checkers that don't use all JSpecify information (e.g., no support for generics)
nullness checkers that choose not to report mismatches that look unlikely to be true bugs
nullness checkers that want to test more cases than the samples do

For example, what we've done for the checker that we've been using internally is set up our own branch of the tests, tweaked to reflect the known bugs in that checker and the extra features that it provides.

That seems sufficient to me, though please consider this an invitation not just to you but also to tool authors and end users to tell me if I'm wrong :)

In particular, I'm not sure what the idea of "JSpecify conformance" adds to this. It seems like there are two possibilities:

Tool authors implement conformance testing as a separate "mode" for their tools. If so, this requires them to put at least some time into changes that by definition are not visible to their users. Given how difficult it is to implement a nullness checker, I want to avoid anything that could be seen as a hurdle.
Tool authors implement conformance testing by changing the user-visible behavior of their tools. If this is what we want, then I see your framing of conformance tests as coy in its own way: We're still saying "Your tool could produce no output to users and still be 'conforming,'" but we're implying that it probably should produce output in this set of cases.

I prefer a more direct approach: Here's what a naive tool could show to users when given these inputs. That makes a reasonable baseline for tool behavior. If a tool author departs from that, that author should want to give some thought to why -- whether that's "We haven't had time to implement that yet," "We're actually doing something better because we also know x," or whatever the case may be. I think that that's enough for tool authors and end users to make reasonable decisions.

Maybe this is all compatible with what you're aiming for? It has felt to me like there's some additional goal that you have that I haven't identified, but I may have that wrong.

msridhar commented 2 years ago

We've been working a bit on getting NullAway to run on these samples and thinking about how to do so. It'd be good to have a document at some point recommending a workflow for tools to use to incorporate them. E.g., if the recommended workflow is to fork this repo, make a branch, and tweak the tests there as needed, we can go with that. It would be a bit of a pain, in that for CI testing, we might need to clone another repo or add a submodule to get the relevant test inputs. But if we just make a copy of the samples in the NullAway repo, syncing from upstream may be more painful. I guess part of the question depends on how often we expect the existing samples to be modified, as opposed to just adding new samples.

Eventually, it might be nice for the samples to be versioned somewhere, so it's easier to track when changes are made.

KengoTODA commented 2 years ago

Just FYI: I'm already using the samples as conformance tests for spotbugs-jspecify-plugin: https://github.com/spotbugs/spotbugs-jspecify-plugin/blob/d8afc443db6cc6949cfb2a70b418177ae9217f1d/src/test/java/com/github/spotbugs/jspecify/TestWithSample.java

SpotBugs checks not .java file but .class file, so there are several difficulties to use for test. But it's valuable to know where we have points to improve.

netdpb commented 8 months ago

We're working on replacing the samples with conformance tests, but we've decided this is not blocking for the 1.0 release.

jspecify / jspecify

Recast "samples" as conformance tests #232