Hypothesis produces poor results

google / atheris

Apache License 2.0

1.38k stars 111 forks source link

Hypothesis produces poor results #20

Open TheShiftedBit opened 3 years ago

TheShiftedBit commented 3 years ago

I'm in the process of writing end-to-end tests to make sure Python coverage is high-quality. In doing so, I discovered that Hypothesis structured fuzzing causes really poor fuzz quality - even the example in the readme doesn't work:

import atheris
from hypothesis import given, strategies as st

@given(st.from_regex(r"\w+!?", fullmatch=True))
@atheris.instrument_func
def test(string):
  assert string != "bad"

atheris.Setup(sys.argv, atheris.instrument_func(test.hypothesis.fuzz_one_input))
atheris.Fuzz()

I checked, and this isn't caused by the new coverage method - this works poorly with old coverage too. Doing this with regular Atheris, however, works excellently.

@Zac-HD, as the original contributor of the Hypothesis examples: do you have any suggestions here? I was thinking something along the lines of an external mutator for libFuzzer might work to fix the issues here. That's how libprotobuf-mutator for C++ works. @nedwill your input might also be helpful here.

nedwill commented 3 years ago

How do mutations work with Hypothesis? I assumed they just did generation, not mutation of existing test cases. This example does show one of the challenges with such an expressive Hypothesis strategy as arbitrary (?) regex. I haven't seen it done before, but intuitively a mutator for regex might involve matching parts of the seed string to different states in the regex FSM and adding/replacing/editing parts of the input without producing a string that won't be accepted. There may already be logic in hypothesis to do this unless they're randomly generating strings and checking if they match, so it may not be too bad.

IMHO, I would just write a mutator for the subset of strategies for which inputs are a simple tree structure and warn the user not to use regex for fuzz testing.

TheShiftedBit commented 3 years ago

A note, I'm removing references to Hypothesis from the repo, at least for now - it's really, really bad.

Now that we have more control of the coverage system, I actually plan to revive https://github.com/google/atheris/issues/5, which might make regexes work way better.

rmonat commented 2 years ago

Just to be sure: is this issue applying to all generators created by Hypothesis or the regex-specific one?

I tried the code below and seem to have the same performance issues.

import atheris, sys

from hypothesis import given, strategies as st

@given(st.text())
@atheris.instrument_func
def test(string):
  assert string != "bad"

atheris.Setup(sys.argv, atheris.instrument_func(test.hypothesis.fuzz_one_input))
atheris.Fuzz()

TheShiftedBit commented 2 years ago

All generators created by Hypothesis.

Atheris now supports custom mutators, so that might be a better solution.