google / atheris

Apache License 2.0
1.38k stars 111 forks source link

Pass the target to Fuzz() instead of Setup() #27

Closed gdiscry closed 2 years ago

gdiscry commented 2 years ago

For my projects, I implemented a generic fuzzer based on Atheris that can target any function with

$ myfuzzer [-libFuzzer_args ...] --target=package.module:function [corpus ...]

The options with a single dash are consumed by Atheris (for libFuzzer), the options with two dashes are used by my fuzzer and the positional arguments are the corpus (used by both libFuzzer and my fuzzer). Depending on some options, my fuzzer will either call atheris.Fuzz() or perform an other action using the target and the corpus files. Here is a simplified version that only calls atheris.Fuzz():

#!/usr/bin/env python
import argparse
import sys

import atheris

@atheris.instrument_func
def fuzz_target(data):
    """Dummy target."""
    return data

@atheris.instrument_func
def fuzz_proxy(buffer):
    fuzz_target(buffer)

def load_target(spec):
    module_name, _, function_name = spec.partition(":")
    with atheris.instrument_imports():
        import importlib
        module = importlib.import_module(module_name)
    import functools
    return functools.reduce(getattr, function_name.split("."), module)

def main(args=None):
    global fuzz_target
    parser = argparse.ArgumentParser()
    parser.add_argument("--target", type=load_target, required=True)
    parser.add_argument("--action", default="fuzz")
    parser.add_argument("corpus", nargs="*")
    args = parser.parse_args(args)
    if args.action == "fuzz":
        fuzz_target = args.target
        atheris.Fuzz()

if __name__ == '__main__':
    main(atheris.Setup(sys.argv, fuzz_proxy)[1:])

The separation between Setup() and Fuzz() is useful for my use case. However, I cannot understand why the target must be passed to Setup(). I have read the code of Setup() and Fuzz() and found nothing explaining why the target is required as early as Setup(): the target is simply stored by Setup() until Fuzz() is called.

As seen above, I have to jump through hoops to fuzz the real target by setting up a proxy target. By passing the target to Fuzz() instead of Setup(), the code would be greatly simplified:

#!/usr/bin/env python
import argparse
import sys

import atheris

def load_target(spec):
    module_name, _, function_name = spec.partition(":")
    with atheris.instrument_imports():
        import importlib
        module = importlib.import_module(module_name)
    import functools
    return functools.reduce(getattr, function_name.split("."), module)

def main(args=None):
    parser = argparse.ArgumentParser()
    parser.add_argument("--target", type=load_target, required=True)
    parser.add_argument("--action", default="fuzz")
    parser.add_argument("corpus", nargs="*")
    args = parser.parse_args(args)
    if args.action == "fuzz":
        atheris.Fuzz(args.target)

if __name__ == '__main__':
    main(atheris.Setup(sys.argv)[1:])

This change wouldn't make things more complicated when the target is hard-coded, and could be introduced in a backward compatible way.

TheShiftedBit commented 2 years ago

Hi Georges, you're right, it would be possible for Fuzz() to take the callback instead of Setup(). Unfortunately, doing so in a backwards-compatible way would mean adding extra complexity to the API and the argument handling. Atheris would have to support the callback argument in either place. If this enabled additional features, that may be worth it; however, as you mentioned you can just use a wrapper function to solve the problem for your use-case. I don't think it's worth adding the complexity to Atheris.