Inconsistent behaviour for `foldr_sum`

EmilyOng commented 10 months ago

Issue

In the following program https://github.com/EmilyOng/AlgebraicEffect/commit/f850fab435562c652750ca57db033cf71c768dd2:

let rec foldr f li acc =
  match li with 
  | [] -> acc 
  | x :: xs -> 
    let acc' = f x acc in 
    foldr f xs acc'

let rec sum li = 
  match li with 
  | [] -> 0
  | x :: xs -> x + sum xs

let foldr_sum1 xs k
(*@ ex r; sum(xs, r); ens res=r+k @*)
= let g c t = c + t in
  foldr g xs k

let foldr_sum2 xs init
(*@ ex r; sum(xs, r); ens res=r+init @*)
= let g c t = c + t in
  foldr g xs init

Running DEBUG=0 dune exec parsing/hip.exe src/examples/later/random_fold.ml gives the following output:

Summary

``` ========== Function: foldr ========== [Raw Post Spec] ens emp; ens li=[]; ens res=acc \/ ex v4; ens emp; ex x xs; ens li=x::xs; ex v0; f(x, acc, v4); ens emp; ex v2; foldr(f, xs, v4, v2) [Normed Post] ens li=[]/\res=acc \/ ex v4 x xs; ens li=x::xs; f(x, acc, v4); ex v2; foldr(f, xs, v4, v2); ens res=v2 [Z3 Time] 256 ms ===================================== ========== Function: sum ========== [Raw Post Spec] ens emp; ens li=[]; ens res=0 \/ ex v37; ens emp; ex x xs; ens li=x::xs; ex v35; sum(xs, v37); ens emp; ens res=x+v37 [Normed Post] ens li=[]/\res=0 \/ ex v37 x xs; ens li=x::xs; sum(xs, v37); ens res=x+v37 [Z3 Time] 135 ms =================================== ========== Function: foldr_sum1 ========== [Specification] ex r; sum(xs, r); ens res=r+k [Normed Spec] ex r; sum(xs, r); ens res=r+k [Raw Post Spec] ex v65; ens emp; ens v65=(fun c t v61 (*@ ens v61=c+t @*) -> c + t); ens emp; ex v63; foldr(v65, xs, k, v63); ens res=v63 [Normed Post] ex v63 v65; ens v65=(fun c t v61 (*@ ens v61=c+t @*) -> c + t); foldr(v65, xs, k, v63); ens res=v63 [Forward Time] 21 ms [Entail Check] true [Entail Time] 868 ms [Z3 Time] 880 ms ========================================== ========== Function: foldr_sum2 ========== [Specification] ex r; sum(xs, r); ens res=r+init [Normed Spec] ex r; sum(xs, r); ens res=r+init [Raw Post Spec] ex v210; ens emp; ens v210=(fun c t v206 (*@ ens v206=c+t @*) -> c + t); ens emp; ex v208; foldr(v210, xs, init, v208); ens res=v208 [Normed Post] ex v208 v210; ens v210=(fun c t v206 (*@ ens v206=c+t @*) -> c + t); foldr(v210, xs, init, v208); ens res=v208 [Forward Time] 20 ms [Entail Check] false [Entail Time] 1129 ms [Z3 Time] 1153 ms ========================================== ========== FINAL SUMMARY ========== [ LOC ] 22 [ LOS ] 2 [Forward+Entail] 2.038308 s [ AskZ3 ] 2.42426109314 s ```

In particular, the entailment check for foldr_sum1 passes, but not foldr_sum2.

The difference between the two programs is the naming of the variable k (passes) vs init (fails). This is weird, since the SMT generated for both programs are exactly the same (except for the naming of the variables).

SMT for foldr_sum1 (passes. is unsatisfiable)

```smt (declare-datatypes ((List 0)) (((nil) (cons (head Int) (tail List))))) (declare-fun v121 () List) (declare-fun v93 () List) (declare-fun v119 () Int) (declare-fun v161 () Int) (declare-fun v120 () Int) (declare-fun xs () List) (declare-fun v148 () Int) (declare-fun v92 () Int) (declare-fun v91 () Int) (declare-fun k () Int) (declare-fun res () Int) (assert (= v93 v121)) (assert (= v161 v119)) (assert (= xs (cons v120 v121))) (assert (= v161 v148)) (assert (= xs (cons v92 v93))) (assert (= (+ k (* (- 1) v91) v92) 0)) (assert (= (+ res (* (- 1) v91) (* (- 1) v148)) 0)) (assert (let ((a!1 (exists ((r Int)) (and (= (+ v120 v119 (* (- 1) r)) 0) (= (+ res (* (- 1) k) (* (- 1) r)) 0))))) (not a!1))) (check-sat) ```

SMT for foldr_sum2 (fails. result is unknown due to incomplete quantifiers (by querying get_reason_unknown))

```smt (declare-datatypes ((List 0)) (((nil) (cons (head Int) (tail List))))) (declare-fun v266 () List) (declare-fun v238 () List) (declare-fun v264 () Int) (declare-fun v306 () Int) (declare-fun v265 () Int) (declare-fun xs () List) (declare-fun v293 () Int) (declare-fun v237 () Int) (declare-fun v236 () Int) (declare-fun init () Int) (declare-fun res () Int) (assert (= v238 v266)) (assert (= v306 v264)) (assert (= xs (cons v265 v266))) (assert (= v306 v293)) (assert (= xs (cons v237 v238))) (assert (= (+ init (* (- 1) v236) v237) 0)) (assert (= (+ res (* (- 1) v236) (* (- 1) v293)) 0)) (assert (let ((a!1 (exists ((r Int)) (and (= (+ v265 v264 (* (- 1) r)) 0) (= (+ res (* (- 1) init) (* (- 1) r)) 0))))) (not a!1))) (check-sat) ```

Given the SMT for both programs, running them locally correctly produces unsat.

There's a few ways that I can try to workaround this, though I am not sure what's the root cause yet. Here they are (each independent of others)

Remove additional simplification step in 1 and 2
- In this case, removing the simplification step just leaves additional "true" T symbols and conjunctions. The expression to be checked seems logically equivalent to that without simplification.

With simplification

```smt (let ((a!1 (and (= v238 v266) (= v306 v264) (= xs (cons v265 v266)) (= v306 v293) (= xs (cons v237 v238)) (= v236 (+ v237 init)))) (a!2 (exists ((r Int)) (and (= r (+ v265 v264)) (= res (+ r init)))))) (let ((a!3 (=> (and a!1 (= res (+ v293 v236)) a!1) a!2))) (not a!3))) (check-sat) ```

Without simplification

```smt (let ((a!1 (and (= v236 v264) (= v304 v262) (= xs (cons v263 v264)))) (a!2 (and (= v304 v291) (= xs (cons v235 v236)) (= v234 (+ v235 init)))) (a!3 (exists ((r Int)) (and (= r (+ v263 v262)) (= res (+ r init)))))) (let ((a!4 (=> (and true a!1 a!2 (= res (+ v291 v234)) true a!1 (and true true (and true true)) true a!2 (and true true (and true true)) (and true true (and true true)) true (and true true)) a!3))) (not a!4))) (check-sat) ```

Add a random seed (code) (Source from Z3's issue repository)
Turn on "proof generation" (code):
- I am not sure why this works, maybe turning on proof allows the solver to try more options. In fact (weirdly), the time taken for Z3 to prove the entailment is around ~600ms, better than turning off proof generation.
Use WHY3 prover

dariusf commented 10 months ago

We don't do anything special to handle quantifiers, so we're at the mercy of Z3's heuristics, which happen to involve variable names (your search finds a lot of similar issues), formula structure (simplification), and options like proof generation.

Yesterday I also encountered differences across (patch) versions: unsat on 4.12.2, unknown on 4.12.4. I've fixed the version to the former for now. Fixing the seed is similar: we get a bit more determinism, but ultimately we're still relying on heuristics, just biased a certain way.

I think the long term solution for us is to use Why3, as it interfaces with Z3 more optimally than we do and allows other provers to be used, so we're not as reliant on the quirks of any particular one. For hard problems we may have to provide our own e.g. triggers.

EmilyOng commented 10 months ago

Thanks for explaining!

hipsleek / Heifer

Inconsistent behaviour for `foldr_sum` #7

Issue