risicle / cpytraceafl

CPython bytecode instrumentation and forkserver tools for fuzzing pure python and mixed python/c code using AFL
MIT License
28 stars 4 forks source link

Instrumentation could be faster #1

Open vanhauser-thc opened 4 years ago

vanhauser-thc commented 4 years ago

I was looking at tracehook_line_trace_hook in _tracehookmodule.c - the instrumentation you are doing is overly complex and therefore is very, very time consuming.

if the bytecode offset is something that does not change during fuzzing, why not solely use this information? maybe do some assessment of the value and e.g. if its always of minimum increments of 8 then do a(bytecode_offset >> 3) % MAP_SIZE and be done with it. and if it is important that a lineno exits just check that it exists? IMHO the multiplication just decreases speed and mixing the different values gives no benefit.

Try to remove everything that is not absolutely needed or can be precomputed somewhere else.

also this is useless:

uint32_t this_loc = state >> (32-afl_map_size_bits);

if the map size would change during a run, afl would not know about it anyway.

risicle commented 4 years ago

I was looking at tracehook_line_trace_hook in _tracehookmodule.c - the instrumentation you are doing is overly complex and therefore is very, very time consuming.

Well, because I'm working with CPython, I rather have the luxury of not having to worry about micro-optimizations because in the grand scheme of things, the branches and multiplies I add here are negligible in their impact - doing the three python attribute lookups in this tracehook each result in many hundreds of instructions, with much branching and pointer chasing.

The bytecode offset is a code-object-local offset (code objects are like functions), so if I just used that I would just end up with a bunch of numbers usually not going past a couple hundred and we'd be in collision city.

A value more likely for direct use might be f_lineno, because it doesn't really measure line number at all - I'm abusing it to count basic blocks and it has a per-code-object pseudorandom base offset. It would provide clusters of consecutive values at random places in the map space, which I guess could be alright. If I think about it, the traditional wisdom I've always received about making sure your hash functions are nicely mixed is usually about their use in hash tables, where clusters of values will result in overweighted buckets - but there are no buckets in AFL's map so maybe it doesn't matter. But then again, the first thing AFL does with a prev_loc value is >>1, so there's one of the bits you were relying on gone already.

So, the reason for the mixing is mostly paranoia - but I think it's affordable paranoia given the context - I'd rather a tiny slowdown than have me wiping out all my entropy through some stupid bit shifting mistake.

I will continue to think about how necessary this really is though, and definitely appreciate the analysis of my code :+1:

if the map size would change during a run, afl would not know about it anyway.

Maybe not, but my tests will :grin:. One bit of that will otherwise get shifted into the final result through the prev_loc>>1 and that will cause my tests to fail. Another affordable laziness I'm afraid.

risicle commented 4 years ago

Damn, you've just made me realize that it will cope with a decrease in map size, but not an increase. Maybe it is useless.

vanhauser-thc commented 4 years ago

I would recommend to compare the speed of the existing implementation and my proposed changes. I understand your paranoia but IMHO you just decrease speed and increase the chance for collisions in the map and no getting anything positive from it. but this is always easily arguable in both directions. only real tests show the impact and difference :)

risicle commented 4 years ago

Absolutely - I'll probably look into this next time I'm getting frustrated with the fuzzing speed of a python-heavy program. As it is, most of my current targets spend the majority of their time in native extensions.

risicle commented 4 years ago

(though I'm not sure how mixing etc will increase collisions...?)