sanitizer-coverage: trace-cmp-guard request/proposal

andreafioraldi commented 3 years ago

As sancov is now the de facto standard for source-based instrumentation for fuzzing (sancov is the default in AFL++, libfuzzer, honggfuzz, and more) and the instrumentation for data flow tracking is widely used, IMO most fuzzers can benefit from knowing the ID of each comparison. The current state is to hash the return address of __sanitizer_cov_trace_cmpX to identify the comparison (e.g. https://github.com/google/honggfuzz/blob/master/libhfuzz/instrument.c#L355) , but this suffers from collisions.

I propose to adopt the same technique of trace-pc-guard to generate unique IDs (guards) for comparisons, divisions, and GEPs too.

The proposed API is the following (X is the number of bytes ofc):

void __sanitizer_cov_trace_cmp_guard_init(uint32_t *start, uint32_t *stop);
void __sanitizer_cov_trace_div_guard_init(uint32_t *start, uint32_t *stop);
void __sanitizer_cov_trace_gep_guard_init(uint32_t *start, uint32_t *stop);

void __sanitizer_cov_trace_cmpX_guard(uint32_t* guard, uint64_t Arg1, uint64_t Arg2);
void __sanitizer_cov_trace_const_cmpX_guard(uint32_t* guard, uint64_t Arg1, uint64_t Arg2);

// for switchs, as the number of cases is in Cases[0], I propose to pass an array of guards for each case
void __sanitizer_cov_trace_switch_guard(uint32_t* guards, uint64_t Val, uint64_t *Cases);

void __sanitizer_cov_trace_divX_guard(uint32_t* guard, uint32_t Val);

void __sanitizer_cov_trace_gep_guard(uint32_t* guard, uintptr_t Idx);

I think that this will add valuable information to fuzzers, the most trivial is precise value-profile feedback without collisions, let me know what do you think.

kcc commented 3 years ago

I was considering this API but then decided against it. It will indeed improve the precision a little bit (how much -- unclear; maybe we should run a FuzzBench experiment), but it will add complexity and will still be a relatively shallow solution for the problem.

Consider a comparator function called from multiple places. Now, and with the proposed change, we only capture the direct call site of the instrumentation callback (__sanitizer_cov_...), and thus do not distinguish the call sites for the comparator itself

Ideally, trace_cmp callbacks would use more than just the caller PC (or index) as the ID. Instead, we could compute an ID from several PCs in the call stack (using unwinder, or using a software-maintained shadow call stack like tsan/msan does, or using Intel CET once it's widespread)

If you are confident that the proposed change is important enough, I'd suggest to get numbers from FuzzBench.

Otherwise, I would invest into shadow call stack (or maybe just rolling hash of the call stack) maintained alongside sancov.

andreafioraldi commented 3 years ago

It will indeed improve the precision a little bit (how much -- unclear; maybe we should run a FuzzBench experiment),

Yes, totally, we are going to implement it soon in our LTO passes regardless of whether it will be implemented in sancov too or not, and then we will submit a FuzzBench experiment for sure.

Consider a comparator function called from multiple places.

That's true, that's why we instrument also functions that take pointers as the two first arguments guessing that they can be comparator functions (https://github.com/AFLplusplus/AFLplusplus/blob/stable/llvm_mode/afl-llvm-rt.o.c#L1220). However, for value-profile, this is impractical.

Instead, we could compute an ID from several PCs in the call stack

This is a short blanket. The hash of the call stack can improve the detection of cmp inside comparator functions but will increase the number of false positives. If a cmp is in a not comparator function called from many different program points, you will easily fill-up the map with entries related to the same values.

In addition, on small targets, where most of the time is spent in the fuzzer evaluating the maps, you get a significant speedup because the fuzzer knows the exact number of cmps, that is usually less than the default size used for maps with collisions (e.g. 2^64 buckets in our case).

tokatoka commented 5 months ago

Hello. I'm a bit late to join the conversation. Actually last November I implemented the call-site aware cmplog to LibAFL and we run it on fuzzbench to compare its performance.

What we did is basically 1) Assign unique ID to each functio. 2) After observing the call stack of the current call sit, we xor them all. let's say, f1, f2, ..., fn. We calculate ctx = f1 ^ f2 ... ^ fn. and then also take the address of the calling site rip. Lastly we then hash them with xxhash, which is used by AFL++ by default, using cmp_idx = xxhash(ctx, rip). 3) Use the previous to the cmp_idx as the key to the cmpmap.

And, ... here's the result of it. https://www.fuzzbench.com/reports/experimental/2023-11-03-libafl/index.html

In general, I would say it's not for every target. As you can see from the overall score, it says it is worse than plain LibAFL Note that this is a 23 hours run. where these kind of techniques that boost the number of the corpus queue will perform worse (for example, bloaty, freetype2, and harfbuzz are really bigger targets)

However, for some target it is useful, for example if you look at mbedtls, the libafl_cmplog_ctx is the most performing fuzzer.

google / sanitizers

sanitizer-coverage: trace-cmp-guard request/proposal #1325