silentbicycle / theft

property-based testing for C: generate input to find obscure bugs, then reduce to minimal failing input
ISC License
610 stars 31 forks source link

Structural inference could inform shrinking #23

Open silentbicycle opened 7 years ago

silentbicycle commented 7 years ago

Along with dropping/mutating bit pool requests individually, it may be beneficial to infer structure and work in larger units.

theft currently saves the request size information -- for a given bit pool, an alloc callback might request [12, 1, 4, 8, 1, 4, 8, 1, 4, 8, 1, 4, 8, 1, 4, 8, ...] bits at a time. Instead of randomly choosing a set of requests and then mutating/dropping the bits corresponding to them, it could be more effective to work on the [1, 4, 8] groups.

The structure inferred by Sequitur seems like a good approach.

DRMacIver commented 7 years ago

FWIW the way this works in Hypothesis is that the data generation is just explicitly annotated for hierarchical information - you mark example start and stop boundaries and it uses that to create the hierarchical structure (though the fact that it's hierarchical is used only lightly).

I did some work on using sequitur in https://github.com/DRMacIver/structureshrink, but I recall the results being a bit so so (I don't remember the problems. It definitely worked, it just didn't seem to work better than what I already had, which was to look at repeated ngrams and split based on that)