Are separate calls to `atheris.Setup()` and `atheris.Fuzz()` really necessary?

google / atheris

Apache License 2.0

1.38k stars 111 forks source link

Are separate calls to `atheris.Setup()` and `atheris.Fuzz()` really necessary? #8

Closed Zac-HD closed 3 years ago

Zac-HD commented 3 years ago

Given that Fuzz() accepts no arguments, and Setup() is only useful if you then call Fuzz(), do these really need to be exposed as separate functions at the Python level?

Why not (the equivalent of):

def fuzz(
    args: typing.List[str],
    test_one_input: typing.Callable[[bytes], object],
    *,
    enable_python_coverage: bool = True,
    enable_python_opcode_coverage: bool = True,
) -> typing.NoReturn:
    atheris.Setup(args, test_one_input, enable_python_coverage, enable_python_opcode_coverage)
    atheris.Fuzz()

This could even be provided as a convenience method, with the current API left in place for backwards-compatibility.

TheShiftedBit commented 3 years ago

This is a good point, and something I've thought of before. They are actually separate for a reason though: atheris.Setup() might modify the input argument list, by removing arguments handled by libFuzzer. This allows other argument parsing libraries to subsequently be called and not think there are a lot of excess arguments.

We use this feature inside of Google. We call atheris.Setup() to remove libFuzzer arguments, then pass sys.argv to our argument-parsing code to initialize Google infrastructure, then finally call atheris.Fuzz().

TheShiftedBit commented 3 years ago

Given this, do you think it's still worth supporting arguments in both Setup() and Fuzz()?

Zac-HD commented 3 years ago

There is clearly a use-case for the existing API, but consuming parts of sys.argv before calling non-atheris setup and then starting the fuzzer sounds like the sole advantage of the current approach. I think it would be nice to provide a simple API for users who don't have Google infrastructure to deal with; on the other hand I'm not convinced that adding a second API would actually be a net simplification!

I would suggest adding your explanation of why they're separate (above) to the README, and closing this issue without any functional change.