Possible performance hack

risicle commented 2 years ago

Hi, only just noticed this project. I built something similar a couple of years ago (https://github.com/risicle/cpytraceafl) and thought I'd point out the performance hack it uses, seeing as atheris will likely get far more use than cpytraceafl ever will.

The trick it uses is to simply rewrite the bytecode's lnotab so that the "line number" increments only at the beginning of every new BB. Then it's just a matter of adding the tiny native trace function via sys.settrace() (which internally uses the lnotab to detect the beginning of new "lines").

It's ugly, but it means zero extra bytecode needs to be executed at a trace point, no attributes looked up, and the entire instrumentation process is done in 165 lines.

All based on a Ned Batchelder trick of course.

TheShiftedBit commented 2 years ago

Interesting, we'll have to compare the performance versus the current added instrumentation. Do you know if this technique would break the line numbers that are printed in exceptions and such?

risicle commented 2 years ago

Do you know if this technique would break the line numbers that are printed in exceptions and such?

It most certainly does https://github.com/risicle/cpytraceafl#doesnt-abusing-lnotab-break-pythons-debugging-mechanisms

TheShiftedBit commented 2 years ago

I've done some performance benchmarking. For the lnotab strategy, I modified Atheris as follows:

I replaced Atheris' patch_code function with this solution's rewrite function
In native code, I called PyEval_SetTrace to install a tracer with no effect (except to enable line tracing).

For Atheris's strategy, I used the benchmarks as written, except that I disabled dataflow tracing. Since the lnotab strategy doesn't have dataflow tracing, this makes the comparison fair.

# lnotab rewrite:
low_cyclomatic  runs=40000  time=0.20s  execs/sec=200006.87
high_cyclomatic runs=2000   time=0.62s  execs/sec=3234.16

# Atheris rewrite:
low_cyclomatic  runs=40000  time=0.13s  execs/sec=318779.70
high_cyclomatic runs=2000   time=0.27s  execs/sec=7369.53

The existing strategy used by Atheris is 1.5-2x faster. Atheris was disadvantaged in this benchmark, since my minimal lnotab implementation didn't actually register or increment any counters, reducing the time spent in libfuzzer.

risicle commented 2 years ago

Quite surprised by this - I guess tracehook-calling is not considered a fast-path in cpython (if you're installing any sort of trace hook, I imagine the thought is you're not after top performance), while all the bytecode ops used in the full rewriting are.

google / atheris

Possible performance hack #24