vprover / vampire

The Vampire Theorem Prover
https://vprover.github.io/
Other
302 stars 52 forks source link

smt2 input not satisfiable, but preprocessed & fixed up input is #403

Closed bobismijnnaam closed 2 years ago

bobismijnnaam commented 2 years ago

Hi, I have an smt2 snippet which I think should be satisfiable, but vampire cannot establish that. If I take the preprocessed tptp of that, and fix it up because it contains some problems, vampire can satisfy it somehow. So my questions are:

The smt2 is as follows:

(declare-datatypes (($Snap 0)) ((($Snap.unit) ($Snap.combine ($Snap.first $Snap) ($Snap.second $Snap)))))
(define-sort $Perm() Real)
(define-fun $Perm.No () $Perm 0.0)
(define-fun $Perm.isValidVar ((p $Perm)) Bool (<= $Perm.No p))
(check-sat)

When running this with vampire 4.6.1-sledgehemmer-eval, I get a timeout after 60 seconds, using the following command:

$ vampire-461-sledge --mode casc_sat test.smt2

I tried to have a look at the preprocessed input as follows:

$ vampire --input_syntax smtlib2 --mode clausify test.smt2
tff(type_def_5, type, '$Snap()': $tType).
tff(func_def_0, type, '$Snap.unit': '$Snap()').
tff(func_def_1, type, '$Snap.combine': ('$Snap()' * '$Snap()') > '$Snap()').
tff(func_def_2, type, '$Snap.first': '$Snap()' > '$Snap()').
tff(func_def_3, type, '$Snap.second': '$Snap()' > '$Snap()').
tff(func_def_5, type, '$Perm.No': $real).
tff(pred_def_2, type, '$Perm.isValidVar': $real > $o).
cnf(u8,axiom,
    ($true = X0) | ($false = X0)).

cnf(u7,axiom,
    ($true != $false)).

cnf(u6,axiom,
    '$Snap.combine'(X0,X2) != '$Snap.combine'(X1,X3) | X2 = X3).

cnf(u5,axiom,
    '$Snap.combine'(X0,X2) != '$Snap.combine'(X1,X3) | X0 = X1).

cnf(u4,axiom,
    '$Snap.unit' != '$Snap.combine'(X0,X1)).

cnf(u3,axiom,
    '$Snap.unit' = X0 | '$Snap.combine'('$Snap.first'(X0),'$Snap.second'(X0)) = X0).

When giving this to vampire, I get the following message:

$ vampire --input_syntax smtlib2 --mode clausify test.smt2 | vampire
Parsing Error on line 9
Cannot create equality between terms of different types.
X0 is $i
$true is $o

There are a bunch of errors like this in the clausified output; using --mode preprocess has similar problems.

Fixing all the problems (in a naive way, I must add. I don't know tptp very well), I get the following:

tff(type_def_5, type, '$Snap()': $tType).
tff(func_def_0, type, '$Snap.unit': '$Snap()').
tff(func_def_1, type, '$Snap.combine': ('$Snap()' * '$Snap()') > '$Snap()').
tff(func_def_2, type, '$Snap.first': '$Snap()' > '$Snap()').
tff(func_def_3, type, '$Snap.second': '$Snap()' > '$Snap()').
tff(func_def_5, type, '$Perm.No': $real).
tff(pred_def_2, type, '$Perm.isValidVar': $real > $o).
tff(u8,axiom,
    ![X0: $o]:
    (($true = X0) | ($false = X0))).

tff(u7,axiom,
    ($true != $false)).

tff(u6,axiom,
    ![X0: '$Snap()', X1: '$Snap()', X2: '$Snap()', X3: '$Snap()']:
    ('$Snap.combine'(X0,X2) != '$Snap.combine'(X1,X3) | X2 = X3)).

tff(u5,axiom,
    ![X0: '$Snap()', X1: '$Snap()', X2: '$Snap()', X3: '$Snap()']:
    ('$Snap.combine'(X0,X2) != '$Snap.combine'(X1,X3) | X0 = X1)).

tff(u4,axiom,
    ![X0: '$Snap()', X1: '$Snap()']:
    ('$Snap.unit' != '$Snap.combine'(X0,X1))).

tff(u3,axiom,
    ![X0: '$Snap()']:
    ('$Snap.unit' = X0 | '$Snap.combine'('$Snap.first'(X0),'$Snap.second'(X0)) = X0)).

Vampire reports this is satisfiable:

$ vampire-461-sledge --mode casc_sat test_fixed.tptp
perf_event_open failed (instruction limiting will be disabled): Permission denied
% Running in auto input_syntax mode. Trying TPTP
% ott+11_3_aac=none:afr=on:afp=4000:afq=1.4:amm=off:anc=all:bs=unit_only:bsr=on:bce=on:fde=unused:irw=on:nm=64:newcnf=on:nwc=1:nicw=on:sac=on:sp=reverse_arity:uhcvi=on_31 on test_fixed
% SZS status Satisfiable for test_fixed
% # SZS output start Saturation.
cnf(u15,axiom,
    '$Snap.unit' != '$Snap.combine'(X0,X1)).

cnf(u19,axiom,
    '$Snap.combine'('$Snap.first'(X0),'$Snap.second'(X0)) = X0 | '$Snap.unit' = X0).

cnf(u23,axiom,
    '$Snap.combine'(X0,X2) != '$Snap.combine'(X1,X3) | X2 = X3).

cnf(u27,axiom,
    '$Snap.combine'(X0,X2) != '$Snap.combine'(X1,X3) | X0 = X1).

cnf(u38,axiom,
    '$Snap.combine'(X1,X2) != X0 | '$Snap.second'(X0) = X2 | '$Snap.unit' = X0).

cnf(u45,axiom,
    '$Snap.second'('$Snap.combine'(X0,X1)) = X1).

cnf(u51,axiom,
    '$Snap.combine'(X0,X1) = '$Snap.combine'('$Snap.first'('$Snap.combine'(X0,X1)),X1)).

cnf(u55,axiom,
    '$Snap.combine'(X1,X2) != X0 | '$Snap.first'(X0) = X1 | '$Snap.unit' = X0).

cnf(u72,axiom,
    '$Snap.first'('$Snap.combine'(X0,X1)) = X0).

cnf(u78,axiom,
    '$Snap.combine'(X10,X11) != '$Snap.combine'(X12,X13) | '$Snap.first'('$Snap.combine'(X10,X11)) = X12).

cnf(u85,axiom,
    '$Snap.second'(X0) = '$Snap.second'(X1) | X0 != X1 | '$Snap.unit' = X1 | '$Snap.unit' = X0).

cnf(u105,axiom,
    '$Snap.first'(X0) = '$Snap.first'(X1) | X0 != X1 | '$Snap.unit' = X1 | '$Snap.unit' = X0).

cnf(u125,axiom,
    '$Snap.combine'(X2,X3) != X4 | '$Snap.first'('$Snap.combine'(X2,X3)) = '$Snap.first'(X4) | '$Snap.unit' = X4).

cnf(u129,axiom,
    X0 != X1 | '$Snap.unit' = X0 | '$Snap.combine'('$Snap.first'(X0),'$Snap.second'(X1)) = X0 | '$Snap.unit' = X1).

cnf(u137,axiom,
    X0 != X1 | '$Snap.unit' = X0 | '$Snap.combine'('$Snap.first'(X1),'$Snap.second'(X0)) = X0 | '$Snap.unit' = X1).

cnf(u143,axiom,
    X3 != X4 | X3 != X5 | '$Snap.unit' = X5 | '$Snap.unit' = X3 | '$Snap.second'(X4) = '$Snap.second'(X5) | '$Snap.unit' = X4).

cnf(u147,axiom,
    X3 != X4 | X3 != X5 | '$Snap.unit' = X5 | '$Snap.unit' = X3 | '$Snap.first'(X4) = '$Snap.first'(X5) | '$Snap.unit' = X4).

% # SZS output end Saturation.
% ------------------------------
% Version: Vampire 4.6.1 (commit af1735c99 on 2021-12-01 14:43:47 +0100)
% Linked with Z3 4.8.13.0 f03d756e086f81f2596157241e0decfb1c982299 z3-4.8.4-5390-gf03d756e0
% Termination reason: Satisfiable

% Memory used [KB]: 5628
% Time elapsed: 0.003 s
% ------------------------------
% ------------------------------
% Success in time 0.029 s
selig commented 2 years ago

A few super quick points as I’m away from laptop but will follow up when back.

Vampire assumes it is incomplete for data types. It’s definitely incomplete for real arithmetic. Even if we’re in a case where our procedure is obviously complete we don’t support this.... maybe that’s a feature request.

There’s another output mode that fixes those type errors, maybe it’s called vclausify. We need a different one as the quantifiers mean it’s not real cnf.

Now the interesting bit. Because we can’t add a complete set of axioms for data types, the output problem in Tptp is not equivalent to the input smt one. We should give a warning about this. It’s because tptp doesn’t have data types... for arithmetic we’re okay on this front.

Sent from my iPhone

On 26 Jul 2022, at 08:02, Bob Rubbens @.***> wrote:



Hi, I have an smt2 snippet which I think should be satisfiable, but vampire cannot establish that. If I take the preprocessed tptp of that, and fix it up because it contains some problems, vampire can satisfy it somehow. So my questions are:

The smt2 is as follows:

(declare-datatypes (($Snap 0)) ((($Snap.unit) ($Snap.combine ($Snap.first $Snap) ($Snap.second $Snap))))) (define-sort $Perm() Real) (define-fun $Perm.No () $Perm 0.0) (define-fun $Perm.isValidVar ((p $Perm)) Bool (<= $Perm.No p)) (check-sat)

When running this with vampire 4.6.1-sledgehemmer-eval, I get a timeout after 60 seconds, using the following command:

$ vampire-461-sledge --mode casc_sat test.smt2

I tried to have a look at the preprocessed input as follows:

$ vampire --input_syntax smtlib2 --mode clausify test.smt2 tff(type_def_5, type, '$Snap()': $tType). tff(func_def_0, type, '$Snap.unit': '$Snap()'). tff(func_def_1, type, '$Snap.combine': ('$Snap()' * '$Snap()') > '$Snap()'). tff(func_def_2, type, '$Snap.first': '$Snap()' > '$Snap()'). tff(func_def_3, type, '$Snap.second': '$Snap()' > '$Snap()'). tff(func_def_5, type, '$Perm.No': $real). tff(pred_def_2, type, '$Perm.isValidVar': $real > $o). cnf(u8,axiom, ($true = X0) | ($false = X0)).

cnf(u7,axiom, ($true != $false)).

cnf(u6,axiom, '$Snap.combine'(X0,X2) != '$Snap.combine'(X1,X3) | X2 = X3).

cnf(u5,axiom, '$Snap.combine'(X0,X2) != '$Snap.combine'(X1,X3) | X0 = X1).

cnf(u4,axiom, '$Snap.unit' != '$Snap.combine'(X0,X1)).

cnf(u3,axiom, '$Snap.unit' = X0 | '$Snap.combine'('$Snap.first'(X0),'$Snap.second'(X0)) = X0).

When giving this to vampire, I get the following message:

$ vampire --input_syntax smtlib2 --mode clausify test.smt2 | vampire Parsing Error on line 9 Cannot create equality between terms of different types. X0 is $i $true is $o

There are a bunch of errors like this in the clausified output; using --mode preprocess has similar problems.

Fixing all the problems (in a naive way, I must add. I don't know tptp very well), I get the following:

tff(type_def_5, type, '$Snap()': $tType). tff(func_def_0, type, '$Snap.unit': '$Snap()'). tff(func_def_1, type, '$Snap.combine': ('$Snap()' * '$Snap()') > '$Snap()'). tff(func_def_2, type, '$Snap.first': '$Snap()' > '$Snap()'). tff(func_def_3, type, '$Snap.second': '$Snap()' > '$Snap()'). tff(func_def_5, type, '$Perm.No': $real). tff(pred_def_2, type, '$Perm.isValidVar': $real > $o). tff(u8,axiom, ![X0: $o]: (($true = X0) | ($false = X0))).

tff(u7,axiom, ($true != $false)).

tff(u6,axiom, ![X0: '$Snap()', X1: '$Snap()', X2: '$Snap()', X3: '$Snap()']: ('$Snap.combine'(X0,X2) != '$Snap.combine'(X1,X3) | X2 = X3)).

tff(u5,axiom, ![X0: '$Snap()', X1: '$Snap()', X2: '$Snap()', X3: '$Snap()']: ('$Snap.combine'(X0,X2) != '$Snap.combine'(X1,X3) | X0 = X1)).

tff(u4,axiom, ![X0: '$Snap()', X1: '$Snap()']: ('$Snap.unit' != '$Snap.combine'(X0,X1))).

tff(u3,axiom, ![X0: '$Snap()']: ('$Snap.unit' = X0 | '$Snap.combine'('$Snap.first'(X0),'$Snap.second'(X0)) = X0)).

Vampire reports this is satisfiable:

$ vampire-461-sledge --mode casc_sat test_fixed.tptp perf_event_open failed (instruction limiting will be disabled): Permission denied % Running in auto input_syntax mode. Trying TPTP % ott+11_3_aac=none:afr=on:afp=4000:afq=1.4:amm=off:anc=all:bs=unit_only:bsr=on:bce=on:fde=unused:irw=on:nm=64:newcnf=on:nwc=1:nicw=on:sac=on:sp=reverse_arity:uhcvi=on_31 on test_fixed % SZS status Satisfiable for test_fixed % # SZS output start Saturation. cnf(u15,axiom, '$Snap.unit' != '$Snap.combine'(X0,X1)).

cnf(u19,axiom, '$Snap.combine'('$Snap.first'(X0),'$Snap.second'(X0)) = X0 | '$Snap.unit' = X0).

cnf(u23,axiom, '$Snap.combine'(X0,X2) != '$Snap.combine'(X1,X3) | X2 = X3).

cnf(u27,axiom, '$Snap.combine'(X0,X2) != '$Snap.combine'(X1,X3) | X0 = X1).

cnf(u38,axiom, '$Snap.combine'(X1,X2) != X0 | '$Snap.second'(X0) = X2 | '$Snap.unit' = X0).

cnf(u45,axiom, '$Snap.second'('$Snap.combine'(X0,X1)) = X1).

cnf(u51,axiom, '$Snap.combine'(X0,X1) = '$Snap.combine'('$Snap.first'('$Snap.combine'(X0,X1)),X1)).

cnf(u55,axiom, '$Snap.combine'(X1,X2) != X0 | '$Snap.first'(X0) = X1 | '$Snap.unit' = X0).

cnf(u72,axiom, '$Snap.first'('$Snap.combine'(X0,X1)) = X0).

cnf(u78,axiom, '$Snap.combine'(X10,X11) != '$Snap.combine'(X12,X13) | '$Snap.first'('$Snap.combine'(X10,X11)) = X12).

cnf(u85,axiom, '$Snap.second'(X0) = '$Snap.second'(X1) | X0 != X1 | '$Snap.unit' = X1 | '$Snap.unit' = X0).

cnf(u105,axiom, '$Snap.first'(X0) = '$Snap.first'(X1) | X0 != X1 | '$Snap.unit' = X1 | '$Snap.unit' = X0).

cnf(u125,axiom, '$Snap.combine'(X2,X3) != X4 | '$Snap.first'('$Snap.combine'(X2,X3)) = '$Snap.first'(X4) | '$Snap.unit' = X4).

cnf(u129,axiom, X0 != X1 | '$Snap.unit' = X0 | '$Snap.combine'('$Snap.first'(X0),'$Snap.second'(X1)) = X0 | '$Snap.unit' = X1).

cnf(u137,axiom, X0 != X1 | '$Snap.unit' = X0 | '$Snap.combine'('$Snap.first'(X1),'$Snap.second'(X0)) = X0 | '$Snap.unit' = X1).

cnf(u143,axiom, X3 != X4 | X3 != X5 | '$Snap.unit' = X5 | '$Snap.unit' = X3 | '$Snap.second'(X4) = '$Snap.second'(X5) | '$Snap.unit' = X4).

cnf(u147,axiom, X3 != X4 | X3 != X5 | '$Snap.unit' = X5 | '$Snap.unit' = X3 | '$Snap.first'(X4) = '$Snap.first'(X5) | '$Snap.unit' = X4).

% # SZS output end Saturation. % ------------------------------ % Version: Vampire 4.6.1 (commit af1735c99 on 2021-12-01 14:43:47 +0100) % Linked with Z3 4.8.13.0 f03d756e086f81f2596157241e0decfb1c982299 z3-4.8.4-5390-gf03d756e0 % Termination reason: Satisfiable

% Memory used [KB]: 5628 % Time elapsed: 0.003 s % ------------------------------ % ------------------------------ % Success in time 0.029 s

— Reply to this email directly, view it on GitHubhttps://github.com/vprover/vampire/issues/403, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAJCILQD7YXFA3A5TZKL3F3VV74ZHANCNFSM54WI4J7A. You are receiving this because you are subscribed to this thread.Message ID: @.***>

quickbeam123 commented 2 years ago

@selig , I think you have a typo there, the mode is called tclausify.

(Still, the output clause set will be far from a complete axiomatization of real (or any other) arithmetic or datatypes. So it's satisfiability does not imply the same for the source input.)

bobismijnnaam commented 2 years ago

Thanks for your quick replies! I have some more questions about this if that's okay.

Vampire assumes it is incomplete for data types.

In "Coming to Terms with Quantified Reasoning" it is mentioned that the datatype support implemented in Vampire is actually complete. Did the implementation change and is it now incomplete, or is it incomplete in a different way, say combined with reals or functions, as in my example?

Also, does this incompleteness mean vampire might say "unsatisfiable" when the input is actually satisfiable, meaning, can be refuted? Or am I now mixing up refutation prover terminology?

Regarding this issue: to me it feels like this is not something that's easily fixable, so if this is the case this issue can be closed.

EDIT:

I commented out some more parts of my smt2 input and am still wondering about the behavior I see. If I comment out the smt2 datatypes declaration and use --mode casc, vampire just spins, i.e., repeatedly restarts with "refutation not found, incomplete strategy". If I first preprocess it with tclausify, vampire determines the input to be satisfiable immediately. Finally, if I remove the define-fun, vampire can establish satisfiability without tclausify. Having or not having Real in there makes no difference.

The input:

(define-sort $Perm() Real)
(define-fun $Perm.No () $Perm 0.0)
(check-sat)

It seems that the behaviour I'm seeing has more to do with how vampire axiomates smt's define-fun, and this subtlety is not shown when processing the smt with tclausify? If you'd have time to look into this silly edge case or explain it to me I'd be very grateful. But if this is an unimportant edge case for vampire feel free to ignore this comment & close the issue!

laurakovacs commented 2 years ago

The first-order theory of algebraic data types is decidable (or complete as you mention), as long as it is only about the term algebra (that is, only ADT constructors/destructors) . There are decision procedures for proving ADT formulas; but again, these are decision procedures when the formulas use ADT symbols only. In the " "Coming to Terms with Quantified Reasoning" paper we show decidability for a conservative extension of the ADT theory (restricted to ADT symbols and a subterm predicate); no uninterpreted function and other arbitrary sorts.

Once you start using additional symbols, such as uninterpreted functions or theory-symbols from different theories (e.g. reals, integers), the theory becomes undecidable. The undecidability proof is actually given in the "Coming to Terms with Quantified Reasoning" paper (Theorem 2).

The ADT reasoning in Vampire is based on the incomplete variation of superposition calculus + ADT (section 6 of the aforementioned paper). This variation comes with the advantage that we can work with ADT formulas combined with other theories, but it is incomplete. If a formula is provable/valid, we should be able to find a proof; but that is kind of aall we can guarantee. Satisfiability of ADT properties were not our focus yet, as as said, it cannot be made complete for arbitrary first-order formulas involving ADTs (but not just).

bobismijnnaam commented 2 years ago

Thank you for the explanation, that clears it up for me.

Two more questions if that's okay.

First, if you axiomatize datatypes "manually" using quantifiers and types in tff, is it the case that you get similar expressive power compared to vampire's native ADT support, but performance suffers because new quantifiers are introduced for each datatype?

Second, is there a mailing list or forum that I should use if I want to ask more questions? Or should I use the github issues for that?

quickbeam123 commented 2 years ago

If I remember correctly, you cannot finitely axiomatize the ADT theory, so "manual" will always be incomplete. Even if I am wrong, tailored inference rules which vampire employs are almost always better than axiomatizations, which they replace, because of how the inference rules interact with one another and with the rest of the calculus.

Issues seem to be a good way to reach us with easy to formulate usability questions.

bobismijnnaam commented 2 years ago

Okay, thank you all very much!