zetzit / zz

πŸΊπŸ™ ZetZ a zymbolic verifier and tranzpiler to bare metal C
MIT License
1.6k stars 52 forks source link

Theory behind ZZ #12

Open suhr opened 4 years ago

suhr commented 4 years ago

The README is extremely vague on this. How does it actually work?

Specifically, I have the following questions:

aep commented 4 years ago

Hi, yes readme could expand a little on that. Thanks for the questions.

How are symbolic execution and SMT related?

the user code is translated into SSA form [1], which is mathematical expressions instead of imperative statements. these expressions are then mixed with assertions that need to be held (such as must-not-overflow). The SMT solver will then try to find counterexamples under which the code will violate the assertions. If none are found, the code is fine.

i'd say the major difference to other into-SMT mechanisms is that the SMT form is actually Q_UFBV which means it emulates a concrete machine behaviour rather than an abstract set, including overflow and rounding errors.

How do you ensure that execution is finite?

it doesn't and cannot check if your program will ever finish. [2] the symbolic execution itself will halt because it is tested over the SSA rather than imperative execution. Some undecidable checks like for-all are probably tested heuristically, but that's up to the SMT solver.

How ZZ is related to Hoare-Floyd logic?

it does derive from that idea, yes. similar to dafny.

How ZZ is related to refined types?

i think you're reffering to the haskell thing? essentially zz has the same functionality built in and with much greater expressive power. Not only can you restrict type instantiation to a range of parameters but also to other typestate, execution order, internal state, global state, and even arbitrary free-functions. it is however limited in terms of types, because C doesn't really have a powerful type system.

Does ZZ suffer from the quantifier problem?

zz feeds everything you type into the SSA, so the SMT solver can sometimes derive the nessesary quantifiers, but it will SSA is fundamentally undecidable and the SMT solver will need help from the user alot.

The link on liquidhaskell you posted is very interesting. i will dig further into it and see what can be learned from it

[1] https://en.wikipedia.org/wiki/Static_single_assignment_form [2] https://en.wikipedia.org/wiki/Halting_problem

suhr commented 4 years ago

the symbolic execution itself will halt because it is tested over the SSA rather than imperative execution

Could you elaborate? SSA graph may have cycles, so it's possible to walk it infinitely. I guess you can reach some kind of fixed point while evaluating, allowing to solve in cycles?

aep commented 4 years ago

ah yes, there's some tricks:

  1. goto isn't available
  2. loops are supposed to be tested heuristic as for-all(N where N is loop-cond, loop-body), (i didnt implement it yet, currently its just a single pass, which is technically wrong)
  3. recursion requires manual attestation of a base case, something haskell has built in, but C does not so you need to manually intervene.
  4. even then SMT may occasionally not be solveable. it is terminated then by a timer and the user will be required to add additional constrains. unfortunately there's limits and manual attestation is quite common in practical zz
jsjolen commented 4 years ago

@suhr. Yes, this is an issue for abstract interpreters in general. A solution is to add loop invariants. Another is to allow cycling for n times (clearly unsound).

@aep For the symbolic execution, what kind of underlying logic do you use? What's the assertion language for the path condition :-)?

aep commented 4 years ago

For the symbolic execution, what kind of underlying logic do you use? What's the assertion language for the path condition :-)?

hopefully i understand the question correctly. path conditions are implemented as if-this-then expression to every following temporal assignment

so

if ( a == b ) {
    c = d;
    e = f;
}

becomes

(ite (= a1 b1) (= c2 d1) (= c2 c1) )
(ite (= a1 b1) (= e2 f1) (= e2 e1) )

btw you can inspect the SMT by looking into the files in target/ssa/*

bitwave commented 4 years ago

Maybe looking into the Verified Concurrent C Compiler from Microsoft could be interesting. They used Z3 as SMT solver and hoare calculus using First Order Logic for assertions. They even had an interesting model for concurrent access from different threads. IIRC they used it to verify Hyper-V.

See: https://github.com/microsoft/vcc

abustany commented 4 years ago

Chiming in here (hello Arvid! long time no see :) ), this project also reminds me strongly of FramaC, which allows proving pretty high level properties on your code. FramaC is however still quite immature (or was when I used it 1 year ago) and messy at times... It's a very active project though, so maybe it's more stable now!

aep commented 4 years ago

Adrieeen! i'd suggest a catch up meeting but covid so nope.

frama looks a lot more complete than what zz has at this point. unfortunately the code has a messy license, so that's not something that's reusable, but the associated papers sure are something i want to read.