ossf / fuzz-introspector

Fuzz Introspector -- introspect, extend and optimise fuzzers
https://fuzz-introspector.readthedocs.io
Apache License 2.0
368 stars 54 forks source link

possible alternatives for cyclomatic complexity #303

Open swirsz opened 2 years ago

swirsz commented 2 years ago

https://link.springer.com/chapter/10.1007/978-3-319-30840-1_16 Improving Fuzzing Using Software Complexity Metrics

"evaluated 25 metrics: Lines of code count (LOC), basic blocks count (BBLs), procedure calls count (CALLS);Jilb metric (Jilb) [16], ABC metric (ABC), Cyclomatic complexity (CC) [17], Modified cyclomatic complexity (CC_mod)[16], density of CFG (R) [18], Pivovarsky metric (Pi) [16], Halstead metrics for code volume (H.V), length and calculated length (H.N, H.N∗), difficulty (H.D), effort (H.E), the number of delivered bugs (H.B) [19];Harrison and Magel metric (Harr) [20], boundary values metric (Bound), span metric (Span), Henry and Cafura metric (H&C) [21], Card and Glass metric (C&G) [22], Oviedo metric (Oviedo) [23], Chapin metric (Chapin) [24]; Cocol metric (Cocol)"

"Our experiments on the 104 vulnerable applications have shown that Halstead B metric demonstrates maximum effectiveness to find vulnerable routines in comparison with other metrics. We also proposed our own metric based on Halsted B which shows more stable results. The experimental results of effectiveness assessment have shown viability of our approach and allowed to reduce time costs for fuzzing campaign by an average of 26–28 % for 5 well-known fuzzing systems."

Another alternative: https://arxiv.org/abs/2103.00862 IntelliGen: Automatic Driver Synthesis for FuzzTesting

Idea is to rank functions based on:

  1. Number of pointer dereferences
  2. Number of calls to memory related functions
  3. Calls other functions in the same library (indicates close to the root call)
DavidKorczynski commented 2 years ago

Good ideas!

3. Calls other functions in the same library (indicates close to the root call)

fyi in the functions overview tables we have the columns "Functions reached" and "Reached by functions" which show data about this. And we use this indeed to locate functions close to root.

I think ultimately I would prefer to include the data points you mention into the functions overview tables because then the data will be shown for each function and we can use sorting of the data in the columns to create the ranking you described.

Navidem commented 2 years ago

Thanks @swirsz for the summaries. We definitely can consider replacing cyclomatic complexity with a more effective one, if it does not introduce significant slow down.