Equivalence prover - Githubissues

We want to bring back a KEQ like tool which enables proving that two programs in the same (or different) languages are equivalent.

We'll start with one that assumes the same programming language between the two executions.

Implementation ideas

Proof/Prover classes are here: https://github.com/runtimeverification/pyk/tree/master/src/pyk/proof
Make a new file equivalence.py, which has EquivalenceProof and EquivalenceProver
EquivalenceProof has data:
- Two KCFG datastructures, for each execution.
- List of pairs of node which are in "loose" equivalence relation.
- List of pairs of nodes which are in "tight" equivalence relation.
EquivalenceProver class has data:
- An EquivalenceProof
- A heuristic (semantics specific, program independent): should_be_equivalent: Callable[[CTerm, CTerm], bool]
- A predicate (user supplied program specific, or can supply default language specific program generic variant): are_equivalent: Callable[[Iterable[CTerm], Iterable[CTerm]], bool].
EquivalenceProof.advance_proof does:
- Execute both KCFGs to completion using KCFGExplore.extend, similar to APRProver.advance_proof
- Pairwise establish if should_be_equivalent holds between nodes in each KCFG.
- For the nodes with the same LHS or same RHS between the pairs which passed the first predicate, establish that the tighter second predicate holds over their disjuncts.

An example we talked about:

$s = 3;
if (B) { $s = $s + 3 ; } else { $s = $s + 2 ; $s = $s + 1 };

ASTART => A1
A1 => A2
A2 => A3
A3 => BRANCH
BRANCH => END1
BRANCH => END2

$s = 3;
$s = $s + 3;

BSTART => END

Equivalence: ==P === $s ends the same.

Loose "should be same" predicate: (ASTART, BSTART), (END1, END), (END2, END) Tight "is the same" predicate: ASTART ==P BSTART, END1 \/ END2 ==P END

Some reference literature from @daejunpark

The general approach then was:

Make a product semantics of the two (join their configurations).
Insert no-op "cut" instructions into both programs (the LLVM -> x86 code generator was instrumented directly to do this programatically, so these points were identified mechanically and automatically, but specific to LLVM/x86).
Execute one program (eg. LLVM program) on all paths until it hits the first set of "cut" instructions, call these states "cut-points".
Execute the other program (eg. x86) on all paths until it hits the first set of "cut" instructions.
Establish which cut-point on teh LLVM corresponds to which on on x86.
Establish the tighter equivalence relation between the two programs at these points, even a user-supplied heuristic specific to the program needs.

This worked pretty well, and I think even scaled to some larger algorithms. The linked paper is an early version of the work, which did not make it to publication (other than in Daejun's thesis). But eventually they made it work between LLVM and x86, not just LLVM -> optimized LLVM. Unfortunately, I think the larger work never really made it to publication. Here is Daejun's thesis: https://www.ideals.illinois.edu/items/111906

The biggest challenge this approach faced was the "make a product semantics of the two". This is difficult for all sorts of technical reasons, mostly related to joining the syntax (overlapping productions) and semantics (conflicting rules) of the two semantics in a way that didn't change the behavior of each other. The facilities the K compiler provides for this are very limited. Our approach will be different, as described. We'll instead just build the execution graph of each one separately, and then have the semantics-specific predicate saying "these two points should be in the simulation/bisimulation relation" run over all the nodes in the two graphs. Finally, after establishing that relationship, we'll check that the tighter simulation predicate (user-supplied, or we provide a default semantics-specific one) holds over those pairs of nodes.

runtimeverification / k

Equivalence prover #4217

Implementation ideas

An example we talked about:

Some reference literature from @daejunpark