Profiling formal verification

alastairreid commented 3 years ago

For a symbolic execution tool like KLEE, the major cause of problems is path splits that lead to path explosion: KLEE basically enumerates all viable paths and most programs have an exponential growth in number of paths as data sizes, etc. increase. For a model checker, the problem is state-space explosion caused by path splits and path joins.

Inspired by Bornholt's Finding Code That Explodes under Symbolic Evaluation, and by Cristian Cadar's blog post about measuring coverage, is it possible to identify verification hotspots that are causing verification efficiency problems?

e.g., in KLEE, we could consider using KLEE's profiling support to identify hotspots or common sequences in execution traces.

Questions:

Do we want to find the code that explodes or inputs that trigger the explosion?
Does seeing where a hotspot occurs help us come up with a mitigation? (see also #33 and #66)
Are there particular Rust crates, idioms, language features, etc. that cause problems?
Do hotspots and their mitigation for one tool work for other verification tools and/or fuzzers?

(The starting point is probably a literature search and/or twitter query to find what others have done.)

indo-dev-0 commented 3 years ago

WRT limiting state-space explosion, coverage-guided/structure aware fuzzing is the new hotness in academia (see OpFuzz, FuzzChick, and FuzzChick-citations). In Rust land, FuzzCheck-rs is the only structural fuzzer that I'm aware of. The author claims that it finds bugs faster than proptest and the readme has an impressive example, but they still haven't published quantitative benchmarks.

My apologies if this isn't helpful, my understanding of this space is limited. However, I didn't see either FuzzChick or FuzzCheck mentioned in your literature reviews.

alastairreid commented 3 years ago

Yes, in the world of fuzzing, combining coverage-guided and structure-aware is clearly the way to go. Being structure-aware is especially important to us because it allows us to use a single API for both fuzzing and verification. In Rust, we have had to choose one or the other - we had been wondering whether anybody was going to combine the two. (Interestingly, Hypothesis (that proptest is based on) does combine the two and has many interesting heuristics to find bugs quickly with low human effort.)

Thanks for the pointers to fuzzers - I'll look into them and update my surveys.

On the other hand formal verification tools are always based on a deep analysis of the code being verified. Tools like KLEE have a lot of different heuristics for trying to explore the most interesting part of the code (eg to maximise coverage, to cover the code that formed part of the latest commit, etc.). Those heuristics tend to focus on how to maximise the likelihood of finding a bug by intelligently choosing what subset of possible executions to examine. If we want to go further and get some guarantees, we need the tools to consider all possible executions. On small pieces of code, this is often (but not always) possible. Hence the idea of profiling the (formal) verification process to try to identify what parts of the code are causing problems so that can be done to make verification more feasible.

alastairreid commented 3 years ago

This is now implemented for KLEE by #89 and #107

1) KLEE already generated profiling data in klee_last/run.istats which is in the kcachegrind format 2) Using the rust2calltree tool (#89), we can demangle Rust function names 3) From within docker (also #89), we can run kcachegrind to view and explore the profile 4) Using the number of forks, we can identify bottlenecks 5) Using the concretize! macro (#107), we can merge paths that are forking too much. (See demos/bottlenecks/merging/src/main.rs for example)

project-oak / rust-verification-tools

Profiling formal verification #67