Proof check fails on a QF_UF problem

BrunoDutertre commented 1 year ago

I've been experimenting with cvc5 proof generation using the Alethe format. On QF_UF problems, cvc5 generates many proofs that the checker rejects. I'm reporting the bug here, because if looks like a problem with the checker and not with cvc5 (as far as I can tell, but I may be wrong).

I'm attaching an SMT2 example where this happens:

(set-logic QF_UF)
(declare-sort I 0)
(declare-fun o (I I) I)
(declare-fun e () I)
(declare-fun e3 () I)
(declare-fun e2 () I)
(declare-fun e1 () I)
(assert
 (let ((?v_10 (o (o e3 e2) (o e e))))
   (and (or (= e2 (o e3 e2)) (= (o e3 e2) (o (o e3 e2) e2)))
        (or (= e3 (o e3 e2)) (= e3 (o e2 e2)) (= e3 (o (o e3 e2) e2)))
        (or (= e1 (o e3 e2)) (= (o e3 e2) (o e2 e2))))))
(assert (= (o (o e e) (o e3 e2)) (o (o e3 e2) (o e e))))
(assert (not (= (o (o e3 e2) e2) (o (o e3 e2) (o e e)))))
(assert (and (= e2 (o e1 (o e3 e2))) (= e3 (o (o e e) (o e3 e2))) (= (o e e) (o (o e3 e2) (o e3 e2)))))
(check-sat)

To reproduce the problem: 1) run cvc5 with these options:

cvc5 --no-symmetry-breaker \
  --dump-proofs --proof-format-mode=alethe --dag-thresh=0 --simplification=none --proof-granularity=theory-rewrite \
  test.smt2 | sed -e 's/unsat//' > test.prf

2) check the proof:

carcara check test.prf test.smt2 --expand-let-bindings --skip-unknown-rules

It's not deterministic but it fails on step t165 with different pivot was not eliminated errors:

[ERROR] checking failed on step 't165' with rule 'resolution': pivot was not eliminated:
      '(= (o (o e3 e2) (o e e)) (o (o e e) (o e3 e2)))'
invalid

Here's the relevant proof fragment. It looks like the negative/positive occurrences of this pivot produce the right result in the conclusion. Note that the pivot occurs multiple times in the conclusion and in the premises.

(step t88
 (cl (= (o e3 e2) (o e2 e2))
     (not (= (o e3 e2) (o (o e3 e2) e2)))
     (not (= e3 (o e2 e2)))
     (not (= e1 (o e3 e2)))
     (not (= (o (o e3 e2) (o e e)) (o (o e e) (o e3 e2))))
     (not (= e2 (o e1 (o e3 e2))))
     (not (= e3 (o (o e e) (o e3 e2))))
     (not (= (o e e) (o (o e3 e2) (o e3 e2)))))
     :rule reordering :premises (t87))

(step t89
 (cl (= (o e e) (o (o e3 e2) (o e3 e2)))) :rule and :premises (a3))

(step t90
 (cl (= e3 (o (o e e) (o e3 e2)))) :rule and :premises (a3))

(step t91
 (cl (= e2 (o e1 (o e3 e2)))) :rule and :premises (a3))

(step t92
 (cl (= (o (o e3 e2) (o e e)) (o (o e e) (o e3 e2)))) :rule symm :premises (a1))

(step t93
 (cl (= e1 (o e3 e2))
     (= (o e3 e2) (o e2 e2))) :rule or :premises (t17))

(step t95
 (cl (= e3 (o e3 e2))
     (= e3 (o e2 e2))
     (= e3 (o (o e3 e2) e2))) :rule or :premises (t94))

(step t134
 (cl (= (o (o e3 e2) (o e e)) (o (o e3 e2) e2))
     (not (= e3 (o (o e3 e2) e2)))
     (not (= (o (o e3 e2) (o e e)) (o (o e e) (o e3 e2))))
     (not (= e3 (o (o e e) (o e3 e2))))) :rule reordering :premises (t133))

(step t164
 (cl (= (o (o e3 e2) (o e e)) (o (o e3 e2) e2))
     (not (= (o e3 e2) (o (o e3 e2) e2)))
     (not (= e3 (o e3 e2)))
     (not (= (o (o e3 e2) (o e e)) (o (o e e) (o e3 e2))))
     (not (= e3 (o (o e e) (o e3 e2))))) :rule reordering :premises (t163))

(step t165
 (cl (= (o e3 e2) (o e2 e2))
     (not (= (o e3 e2) (o (o e3 e2) e2)))
     (= (o e3 e2) (o e2 e2))
     (= (o (o e3 e2) (o e e)) (o (o e3 e2) e2))
     (not (= (o (o e3 e2) (o e e)) (o (o e e) (o e3 e2))))
     (not (= e3 (o (o e e) (o e3 e2))))
     (= (o (o e3 e2) (o e e)) (o (o e3 e2) e2))
     (not (= (o e3 e2) (o (o e3 e2) e2)))
     (not (= (o (o e3 e2) (o e e)) (o (o e e) (o e3 e2))))
     (not (= e3 (o (o e e) (o e3 e2)))))
      :rule resolution :premises (t88 t89 t90 t91 t92 t93 t95 t134 t164))

HanielB commented 1 year ago

Thanks for the detailed report, Bruno. You have stumbled on a limitation of Carcará's resolution checker that we should have properly documented. When the resolution pivots are not provided, a general solution would require backtracking, but Carcará currently uses a greedy algorithm to check resolution steps (I'll paste below @bpandreotti's detailed explanation). We will implement the more expensive resolution checking so that it does not fail in cases like this one.

BTW, regarding cvc5 proofs, in the proof-new branch the support for Alethe is significantly more mature than what is currently in main. We are in the process of merging into main the new features and bug fixes, but if you are experimenting it may be best to use that branch for now. In particular with the option --proof-alethe-res-pivots cvc5 prints the pivots as arguments to the resolution step, which in this case allows the checker to verify this step, as @bpandreotti describes below.

From @bpandreotti :

In your example, step t88 introduces the term (not (= e3 (o (o e e) (o e3 e2)))). Step t90 introduces the term (= e3 (o (o e e) (o e3 e2))), however, since (not (= e3 (o (o e e) (o e3 e2)))) is in the final conclusion, Carcará thinks it should not be eliminated, and instead keeps (= e3 (o (o e e) (o e3 e2))) as a pivot, hoping /it will/ be eliminated later. And while (not (= e3 (o (o e e) (o e3 e2)))) is in fact introduced in later clauses, it is in clauses that already have a pivot to eliminate, and, as each clause may only eliminate one pivot, (= e3 (o (o e e) (o e3 e2))) never ends up being eliminated. The correct behaviour would be for Carcará to also try the possibility that (= e3 (o (o e e) (o e3 e2))) is eliminated with the term introduced in t88, instead of just relying on which terms are in the final clause to guess which should be eliminated.

One sure-fire way of ensuring Carcará can check any resolution step is by providing the pivots used as arguments. While this is not officially documented anywhere, Carcará can accept arguments for the resolution rule that provide the pivots. The expected format is this:

For each step of binary resolution, the rule must receive two arguments: the pivot term, and a boolean indicating whether the pivot appears with positive polarity in the left-hand or right-hand clause. That is, if this second argument is true, the checker will search for the pivot in the left-hand clause, and for its negation in the right-hand clause. If it is false, the checker will do the opposite. For a resolution step with n premises, there should be a total of 2*(n - 1) arguments. For the example you gave, if you pass these arguments to the t165 step, it checks ok:
:args (
    (= (o e e) (o (o e3 e2) (o e3 e2))) false
    (= e3 (o (o e e) (o e3 e2))) false
    (= e2 (o e1 (o e3 e2))) false
    (= (o (o e3 e2) (o e e)) (o (o e e) (o e3 e2))) false
    (= e1 (o e3 e2)) false
    (= e3 (o e2 e2)) false
    (= e3 (o (o e3 e2) e2)) true
    (= e3 (o e3 e2)) true
)

c-cube commented 1 year ago

Pardon my lack of context, but another way of checking resolution steps without backtracking is to use unit propagation with the negation of the expected conclusion as initial trail (RUP style). That's reasonably simple if more robustness is required.

HanielB commented 1 year ago

That's a very good point! It may indeed be better to just bite the bullet and have checking via unit resolution as a fallback when the greedy strategy to determine the pivots failed. What do you think @bpandreotti ?

bpandreotti commented 1 year ago

Sounds good to me! I'll work on implementing this as soon as I can.

BrunoDutertre commented 1 year ago

How general do you want your resolution rule to be?

If you know the order in which to apply the resolution steps, then you don't need the pivots. Based on your explanation, it looks like you process premises sequentially: C0 := premise0 C1 := resolve(C0, premise1) C2 := resolve(C1, premise2) etc.

If that's the case, then listing the pivots is redundant. You can always determine the pivot in resolve(Ci, premise i) and checking should be straightforward. Do you ever need anything more complex than that?

FYI: I've tried the option to print pivots. It sometimes produce incorrect proofs. That seems to be a cvc5 bug.

bpandreotti commented 1 year ago

The issue with determining the pivot in resolve(Ci, premise i) is that, in some cases, there are multiple valid pivots that may be used to resolve the clauses. Granted, this only happens in pretty exotic examples, but some Alethe proofs generated by veriT have failed to check using this simpler algorithm. Because of that, we implemented this heuristic that looks at the conclusion clause to infer which pivots to use. However, it is cleary not sufficiently general, as your original example shows.

Still, I do believe that an algorithm for checking resolution steps based on reverse unit propagation will be sufficiently general, and won't need the pivots to be listed.

bpandreotti commented 1 year ago

6529de8 implements a basic version of RUP-based resolution checking, as a fallback to the heuristic algorithm. Now, the example you gave checks without errors, even if the pivots aren't given as arguments.

BrunoDutertre commented 1 year ago

Thanks for the fix. It works much better now, but I still see one error. I'll open a new issue.

ufmg-smite / carcara

Proof check fails on a QF_UF problem #1