Open DavidKorczynski opened 2 years ago
Additional heuristics to identify interesting functions to fuzz can be:
Functions where user-controlled (untrusted) data reach critical operations like memory alloc/dealloc, pointer arithmetic, etc. To realize this feature, we may employ taint analysis to track untrusted data propagation in the code. We may need an interface to let developer tag some data/function result as untrusted.
Generally recommend functions that perform error-prone operations (like the functions that make call to pointer returning functions in hope of capturing errors like null-deref). For this we may need a function profiler.
We can generalise this issue and consider it to be improvements focused around "input to fuzz engines". This could for example include features such as automated dictionary generation by way of statically analysing code and data in the target
I added a proof-of-concept for this. The focus atm is on dictionary generation by looking at constants used by each fuzzer in its reachable functions. Sample code for extracting strings used in fuzzer-reachable functions is here: https://github.com/ossf/fuzz-introspector/blob/28d5ee4e9be42e962eb21f7b2dc4cb6fdd02a95b/llvm/lib/Transforms/Inspector/Inspector.cpp#L767-L824
I have also refactored the code post-processing code so it is easier to develop individual analyses. The goal is to facilitate a plugin-like interface that makes it easy to rapidly develop new analysis techniques that rely on data collected from fuzz-introspector. Sample code in the post processor is here: https://github.com/ossf/fuzz-introspector/blob/main/post-processing/fuzz_html.py#L604-L622
I modified the simple-example-0
to display dictionary generation. For example for the following code:
https://github.com/ossf/fuzz-introspector/blob/28d5ee4e9be42e962eb21f7b2dc4cb6fdd02a95b/examples/simple-example-0/fuzzer.c#L9-L19
the automatic dictionary generator gives the suggested dictionary:
k0="FUZZCAFE "
k1="FUZZKEYWORD "
This can the be used as explicit input to fuzzers -dict=dictifile
. There's more work to be done here
libFuzzer has the ability to prioritise fuzzing of certain functions. We should use the data from the reachability and coverage analysis to feed information back to the fuzzer about nice-to-analyse functions.
This heuristic could for example be focused around functions that if-hit will: