haf / expecto

A smooth testing lib for F#. APIs made for humans! Strong testing methodologies for everyone!
Apache License 2.0
677 stars 96 forks source link

Fuzz testing #81

Open haf opened 7 years ago

haf commented 7 years ago

I'd like expecto to support parametisation for three things:

Preferably this issue can start the implementation of the F# version for the American Fuzzy Lop fuzzer. There's an existing project called Fizil aiming to do the same, by @CraigStuntz – and generators (and shrinkers) are well implemented by @kurtschelfthout and @mausch – perhaps fed by FuzzDB. Can logic from these be folded into Expecto or be orchestrated by a separate nuget?

In the end, I want people to write articles like CloudFlare's for DNS servers written in F#. Or with this infrastructure, maybe I could combine freya-machines with Suave.Testing to allow people to auto-test their REST interfaces and really have the testing framework guide their implementation. Or we could finally implement TLS 1.2 support on managed Mono/.Net Core without being afraid to release it. (And now TLS 1.3 of course!) We could pass expecto a flag to fuzz rather than antagonising.

Other existing work in the F# space is F* by @catalin-hritcu, @msprotz, @s-zanella amongst others. Can the session types and their modelling of protocols or Z3, the SMT solver, be used to inform the fuzzing as a fuzzing strategy or by modelling control flow?

The aim of Expecto is to bring strong testing methodologies to everyone. It's brought about from my perceived high friction of building bug-free software on .Net.

Use cases:

Can we all cooperate to make Expecto the go-to place for strong testing methodologies on .Net and .Net Core, removing friction and making it simple, even fun, to write secure, stable and well-tested software?

AnthonyLloyd commented 7 years ago

This looks very interesting. At this point I have no experience of it but will definitely read up.

CraigStuntz commented 7 years ago

So, as it happens, I just gave a CodeMash presentation on this topic. Yesterday, in fact. Here are the slides, with and without speaker notes.

There's a lot to unpack in this issue. Certainly, "Use Fizil/AFL in Expecto" is probably not going to be a great combination. On the other hand, both property based testing and fuzzing are special cases of specification-based random testing, so @haf is not wrong to suggest there may be room for good work to be done. I'll follow with some comments on Z3.

Fuzzing vs. Property-Based Testing

There's a fundamental difference in the way people tend to use unit tests and fuzzers: Fuzzers tend to be used on a whole application for a long time (possibly thousands of cores for weeks), whereas unit tests are expected to be very fast and tend to test a single function.

AFL

AFL has a few core ideas which are worthy of thought:

Dumb Fuzzing

An ongoing (friendly) debate in the security community is whether fuzzers should be "smart" (format aware) or "dumb" (format-ignorant). An example of a smart fuzzer is Csmith; an example of a dumb fuzzer is AFL. Clearly, both approaches have worked well in certain spaces!

QuickCheck mostly falls on the "smart" side of the fence; generators tend to be type-aware. It certainly doesn't have to be this way! I suspect most QuickCheck clones fall into the "smart" camp because they tend to be implemented in languages where unchecked bit fiddling of arbitrary data types is frowned upon. Also, QuickCheck tends to be run on functions (which have a few arguments of specific types) rather than programs (which tend to operate on streams).

However, this means that QuickCheck users must often write generators, while AFL users can typically fuzz anything by writing a fairly simple test harness.

Instrumentation

It's not unheard of to do instrumentation of systems under test in unit testing -- code coverage tools, for example -- but the sort of compile-time instrumentation that AFL does is probably out of the question. Fizil instruments compiled binaries rather than ASM as AFL does, but still probably isn't a good fit for the test runner. I've never really looked at what the Visual Studio Test Code Coverage tool does. I suspect it uses the VS profiler. On one hand, it's a better fit for this use case. On the other hand, it's highly coupled to VS.

Genetic Algorithms

AFL uses genetic algorithms to find new paths in the system under test. However, it does this after roughly 24 hours of deterministic tests.

I'm not familiar with a QuickCheck implementation which uses genetic generators. It might work, but what would the signal be? QC doesn't trace paths, and it tends to stop on the first negative signal.

One approach to implementing genetic selection of inputs would be to trace paths, although this is complicated (see above). Another approach would be to use the property to find multiple negative results (in much the same way that compilers don't tend to stop after the first error). Would this make QuickCheck better? Probably not if it's only testing a single function. If you, like AFL or Fizil, were testing a whole program, probably.

Speed

If you want to make property based testing better, make it faster. End of story. Imagine you could mark a property as parallelizable. That give you a free 4-8* improvement on the average system.

Z3

Now this is something altogether different. Z3 can be useful in property-based testing (see CutEr, but it's research-level work right now.

AnthonyLloyd commented 7 years ago

Is there a .Net Z3 / concolic library? There is Pex but is it dead or only in VS?

haf commented 7 years ago

I've used this before with great success https://github.com/dungpa/Z3Fs

AnthonyLloyd commented 7 years ago

Looks very good. Don just tweeted it.

I guess its a lot of work to go from Z3 to something like CutEr? I thought Pex was this but I need to investigate more.