Bytecode instrumentation vs. settrace

pd-fkie commented 3 years ago

Hello atheris team,

I would like to propose an improvement of atheris that can at least double the execution speed of your fuzzer.

In atheris you use sys.settrace for coverage
collection but this is not the fastest approach.
It is possible to instrument .py files the way AFL
instruments .c files in order to get rid of all the runtime overhead.
At the beginning of each basic
block a call to atheris.log(idx) can be inserted
that like __afl_maybe_log(idx) treats idx as an
offset into the counter-region and increments the
corresponding byte. This way the number of instrumented
locations is known at compile-time and a call to
atheris.reg(num) can be inserted at the very beginning
of the module that tells atheris how many counters
this module needs.
At the beginning of a fuzz target where all modules
are imported atheris collects the atheris.reg calls
and keeps track of the overall number of counters needed.
In atheris.Fuzz() a memory region of a suitable size
can be allocated and used as the region for counters.

This is perfectly compatible with your
approach of fuzzing C/C++ extensions. The extensions
just have to be built with -fsanitize-coverage=inline-8bit-counters.

However this is not perfect. It does not support

data-flow guided fuzzing
differential fuzzing since the counters of each module start at 0

But I can image these limitations are not hard to overcome.

Here is a POC I've built that implements the concept described above: python-fuzz-poc I've used the POC to fuzz some libraries and found quite some bugs with it and the results are very promising. On average I get a 5x performance boost in contrast to sys.settrace.

I am very interested in your opinion about this approach. What do you think of this idea?

TheShiftedBit commented 3 years ago

Interesting, this is excellent. We considered using a similar approach, via a modified CPython runtime rather than after-the-fact bytecode editing. We ultimately decided on an extension in an attempt to make Atheris as easy-to-use as possible. (It's already complicated enough.)

A 5x performance improvement is quite significant. Do you know if it's reasonable to transform code objects at runtime rather than by modifying .py / .pyc files? I assume so. If so, that would probably be the best way to do this. That way, Atheris can still be an extension providing any shared code, but coverage calls can be added where needed without a tracer.

This also means we can support more interesting tracing (like string comparison) in Python versions before 3.8, which would be excellent. I was considering implementing better regular expression coverage via monkey-patching anyway.

Could you explain more about what you meant by "It does not support...differential fuzzing since the counters of each module start at 0"?

pd-fkie commented 3 years ago

Transforming code objects of modules is perfectly possible.
I added a new method instrument(module) which gets a module object as a parameter, retrieves the code object of the module, instruments the code and creates a new module with the same name but with the instrumented code.

This can look e.g. like this:

import sys
import pefile
import lf

def test_one_input(data):
    ...

if __name__ == "__main__":
    lf.instrument(pefile)              # <-- NEW
    lf.fuzz(test_one_input, sys.argv)

(See the latest version of my POC)

Could you explain more about what you meant by "It does not support...differential fuzzing since the counters of each module start at 0"?

I meant that there cannot be more than one instrumented module in the module list at a time. But this problem has been solved by the in-memory instrumentation.

Update: After adding instrumentation for data-flow tracing the performance benefit dropped from 5x to 2x.

TheShiftedBit commented 3 years ago

Just wanted to update you, this is on our roadmap :)

TheShiftedBit commented 3 years ago

Question: would you like to merge this into Atheris, or would you prefer we do it? If you want to do it, could you provide a well-documented PR?

pd-fkie commented 3 years ago

At the moment I wouldn't like to merge it into atheris because

It's incomplete (not all branches get instrumented, some basic blocks get instrumented twice)
It's only for python3.8
It's incredibly slow

but I am happy to create a PR when all of those deficiencies have been fixed.

TheShiftedBit commented 3 years ago

Yes, absolutely!

google / atheris

Bytecode instrumentation vs. settrace #16