Recommendation to check in proptest regressions seems off

vlovich commented 2 months ago

My experience with JS's fastcheck which is where I had more experience with applying property testing in production, I found myself adding explicit test cases when encountering a bug. That way I can explicitly document the specific conditions that leads to the bug and have an explicitly documented test case for the regression while leaving the property testing to continue randomly exploring potential other bugs. The other advantage is that even if something subtle about the property generation changes, the regression test remains durably in the code whereas the old regressed seed is now testing an irrelevant case & not providing the protection intended.

The other problem is that with every regression you add, you remove 1 random fuzz attempt from your test run. At the limit, if you have 250 issues in your crate found through property testing, you won't notice that your property test is now just rerunning those regressions on each run only instead of trying to find new issues. Yes you can increase the number of cases, but you have to remember to do this & the value of running regressions through proptest may not be as good (proptest will be slower at running those regressions + the ability to catch that regression depends on the strategies never changing & the property test remaining static).

Of course, maybe I'm misusing proptest & it's expected to be used differently.

This might be related to https://github.com/proptest-rs/proptest/issues/439 in that if proptest only recorded the last seed attempted at the start of the program and removed it on success (thus rerunning if the proptest aborted for any reason), things would work a bit more smoothly.

matthew-russo commented 2 months ago

I see a few related but different topics here:

explicit test cases

we're looking at adding first-class support for these so you can share test code between arbitrary cases and well-known edge cases. i think longer term this will be an ergonomic approach to dealing with this and you can disable persistence and just list our your explicit cases

validity of a seed against changing properties

i don't have a perfect answer for this one. its definitely an issue, especially for areas that have some churn on the underlying data that gets generated. i don't think this makes it fundamentally useless but there are definitely cases where it becomes borderline useless against its original goal. the few alternative/complementary approaches I can think of off the top of my head aren't foolproof either:

we could add some type of fingerprint/signature/hash of the data generated via Arbitrary and persist this alongside the seed and when a mismatch is found between the hash on-disk and the hash generated in memory, we could print a warning informing the user that their seed is no longer generating the same data. this might be pretty invasive in terms of implementation -- haven't thought about it too deeply, and it still leaves effort on the end-user to go clean up stale seed
we could attempt to serialize the actual values generated, rather than/in addition to the original seed and then re-use the values verbatim. this also seems a bit invasive as we'd need to be able to serialize/deserialize generated values. it would also lead to variable sizes of the persisted file and since the generated data is... Arbitrary, i'm not a big fan of this one

persisted cases decrease from the number of fuzzed cases

this one isn't quite right. we don't count the persisted cases in our total number of runs so if you say 10 cases and you have 5 persisted seeds, we'll run the 5 persisted seeds and then our count is still at 0 and then run the 10 cases with new generated data. now this comes with the downside of increased execution time but if we assume the fidelity of the persisted seeds is good, then it's effectively the same run time as if you had manually written out the 5 explicit cases

explicit cases vs seed, "misusing proptest", etc

i'd like to think there are almost no "misuses" of proptest out there -- if you're getting some value out of testing with it then that's a pretty good use. that said there are definitely different ways to use proptest and we probably don't have the best ergonomics for all of them. the ones i'm most familiar with in my own projects fall in to two rough categories:

narrow tests that have relatively simple inputs that can be easily constructed
wide tests that are testing entire systems and using proptest to generate sequences of inputs to the system -- the type of testing we've been adding first class support for via proptest-state-machine

for the first category, its fair to say that a workflow to write the manual test case without the indirection of persisting rng seeds may work better for you. for the second category it's a bit less straight forward where the data generated interacts with a stateful system and its not as trivial to stage an explicit test that can elicit the exact behavior of a bug.

vlovich commented 2 months ago

FWIW, I find property tests so hard to read and write, so extracting the regression into a standalone simple test case is still usually a better mechanism of documenting the regression than trying to verify it through property testing (+ it's typically faster to evaluate the regression too).

There's just too much else that can go wrong beyond just the arbitraries that are generated (e.g. changes to custom strategies I imagine can't really be caught but even aside from that changes to the property test assertions written can invalidate the test without any signal to the harness).

proptest-rs / proptest

Recommendation to check in proptest regressions seems off #440