vprover / vampire

The Vampire Theorem Prover
https://vprover.github.io/
Other
282 stars 49 forks source link

How make Vampire return true/false for a given problem (question)? #560

Closed k00ni closed 3 months ago

k00ni commented 3 months ago

I am still very new to the theorem prover world and currently experimenting with Vampire. Its hard to find beginner friendly resources about TPTP + Vampire :), so I hope I can find some insight here.

I would like to give Vampire a problem (problem.tptp) and want it to return true or false (besides the usual output string). In a way, as if I would use an API-function which returns a boolean:

bool answer_question(problem.tptp) 
{ /* ... */ }

The use case is translating a few laws/rules to TPTP and use Vampire to answer/verify a "given question". So my TPTP-file contains a few axioms and in the end a question. My assumption is that the type = question is required so it returns a boolean. I've read #367, but was not able to find insight. Here are my attempts and what I got so far. Any tips are very appreciated.


Attempt 1

File problem.tptp looks like:

fof(axiom1, axiom, ![X]: (p(X) => q(X))).
fof(axiom2, axiom, ![X]: (q(X) => r(X))).
fof(axiom3, axiom, ![X]: (r(X) => ~p(X))).

% -----------------------------------------------------------
% I expect a false here
fof(my_question, question, ?[X]: (p(X) & ~p(X))).

Output is:

$ vampire -t 10 problem.tptp
% Running in auto input_syntax mode. Trying TPTP
% SZS status CounterSatisfiable for problem
% # SZS output start Saturation.
cnf(u12,axiom,
    ~r(X0) | p(X0)).

cnf(u11,axiom,
    ~q(X0) | r(X0)).

cnf(u10,axiom,
    ~p(X0) | q(X0)).

% # SZS output end Saturation.
% ------------------------------
% Version: Vampire 4.8 (commit 8d999c135 on 2023-07-12 16:43:10 +0000)
% Linked with Z3 4.9.1.0 6ed071b44407cf6623b8d3c0dceb2a8fb7040cee z3-4.8.4-6427-g6ed071b44
% Termination reason: Satisfiable

% Memory used [KB]: 4989
% Time elapsed: 0.031 s
% ------------------------------
% ------------------------------

Attempt 2

I used example TPTP code from https://github.com/vprover/vampire/issues/367#issuecomment-1132588963:

% there exists an object X that satisfies both predicates predA and predB where predB has "objectA" as a second arg in this case
fof(a1,axiom,
    ( ? [X] :
        ( predA(X)
        & predB(X,"objectA") ) )).

% "objectA" satisfies predA
fof(a2_1,axiom,
    ( predA("objectA") )).

% "objectB" satisfies predA
fof(a2_2,axiom,
    ( predA("objectB") )).

% "objectC" satisfies predA
fof(a2_3,axiom,
    ( predA("objectC") )).

% for all X, if predA is satisfied for X, then X must be one of `"objectA",  "objectB", "objectC"` and can NOT be anything else
fof(a3,axiom,
    ( ! [X] :
        ( predA(X)
       => ( X = "objectA"
          | X = "objectB"
          | X = "objectC" ) ) )).

% Question, find all objects that satisfy predB with second arg "objectA"
fof(question,question,(?[X]:( predB(X,"objectA") )) ).

Output is:

$ vampire -t 10 problem.tptp 
% Running in auto input_syntax mode. Trying TPTP
% Refutation found. Thanks to Tanya!
% SZS status Theorem for problem
% SZS output start Proof for problem
1. ? [X0] : (predB(X0,"objectA") & predA(X0)) [input]
6. ? [X0] : predB(X0,"objectA") [input]
7. ~? [X0] : predB(X0,"objectA") [negated conjecture 6]
11. ! [X0] : ~predB(X0,"objectA") [ennf transformation 7]
12. ? [X0] : (predB(X0,"objectA") & predA(X0)) => (predB(sK0,"objectA") & predA(sK0)) [choice axiom]
13. predB(sK0,"objectA") & predA(sK0) [skolemisation 1,12]
18. predB(sK0,"objectA") [cnf transformation 13]
23. ~predB(X0,"objectA") [cnf transformation 11]
24. $false [resolution 18,23]
% SZS output end Proof for problem
% ------------------------------
% Version: Vampire 4.8 (commit 8d999c135 on 2023-07-12 16:43:10 +0000)
% Linked with Z3 4.9.1.0 6ed071b44407cf6623b8d3c0dceb2a8fb7040cee z3-4.8.4-6427-g6ed071b44
% Termination reason: Refutation

% Memory used [KB]: 4989
% Time elapsed: 0.037 s
% ------------------------------
% ------------------------------

The output contains 24. $false [resolution 18,23] and my assumption is, that the question is answered with false.


Attempt 3 - with program parameters

File problem.tptp looks like:

fof(axiom1, axiom, ![X]: (p(X) => q(X))).
fof(axiom2, axiom, ![X]: (q(X) => r(X))).
fof(axiom3, axiom, ![X]: (r(X) => ~p(X))).

% -----------------------------------------------------------
% I expect a false here
fof(my_question, question, ?[X]: (p(X) & ~p(X))).

Vampire is called like:

vampire --avatar off --question_answering answer_literal -t 10 problem.tptp

But its the same output:

% Running in auto input_syntax mode. Trying TPTP
% SZS status CounterSatisfiable for problem
% # SZS output start Saturation.
cnf(u13,axiom,
    ~r(X0) | ~p(X0)).

cnf(u12,axiom,
    ~q(X0) | r(X0)).

cnf(u11,axiom,
    ~p(X0) | q(X0)).

% # SZS output end Saturation.
% ------------------------------
% Version: Vampire 4.8 (commit 8d999c135 on 2023-07-12 16:43:10 +0000)
% Linked with Z3 4.9.1.0 6ed071b44407cf6623b8d3c0dceb2a8fb7040cee z3-4.8.4-6427-g6ed071b44
% Termination reason: Satisfiable

% Memory used [KB]: 511
% Time elapsed: 0.034 s
% ------------------------------
% ------------------------------

I assume question_answering might the function I need here, but it seems it doesn't come into play?

MichaelRawson commented 3 months ago

Hi Konrad,

I am still very new to the theorem prover world and currently experimenting with Vampire. Its hard to find beginner friendly resources about TPTP + Vampire :), so I hope I can find some insight here.

Greetings! It's nice to have newcomers. :-)

I assume question_answering might the function I need here, but it seems it doesn't come into play?

I think the "question answering" mode is probably a distraction for you, unfortunately - let's get back to this.

The use case is translating a few laws/rules to TPTP and use Vampire to answer/verify a "given question". So my TPTP-file contains a few axioms and in the end a question. My assumption is that the type = question is required so it returns a boolean.

This is Vampire's default behaviour (no question required, conjecture would be fine here), and it is already giving the answer you want - albeit obtusely.

Your example ?[X]: (p(X) & ~p(X)) is not a theorem. Therefore Vampire doesn't say SZS status Theorem but instead SZS status CounterSatisfiable, which is ATP-speak for "not a theorem". You can read more about the status codes here or here.

If you were to try ?[X]: (p(X) | ~p(X)), Vampire will say SZS status Theorem and give a proof.


What you may find a little surprising is that both a conjecture and its negation can be non-theorems. If you try

fof(test, conjecture, p).

and

fof(test, conjecture, ~p).

Vampire will say that both are not theorems (because they are not).


What does the question-answering mode do, then? If you have a theorem and you want to know what term to "plug in" to the conjecture to get a proof, Prolog-style, Vampire can (optionally) do that. For example:

fof(pa, axiom, p(a)).
fof(q, question, ?[X]: p(X)).

is a theorem, but the "answer" is a.

k00ni commented 3 months ago

@MichaelRawson thank you for your detailed answer. The link to SZSOntology was helpful. I am still trying to wrap my head around all of this, although its hard to synchronize TPTP peculiarities, the Vampire software and Logic in general. Especially the kind of output Vampire is producing is still odd to me (being a developer who is used to APIs returning "simple" values).

When running Vampire with your example theorem:

fof(pa, axiom, p(a)).
fof(q, question, ?[X]: p(X)).

its output is:

$ vampire data/problem1.tptp 
% Running in auto input_syntax mode. Trying TPTP
% Refutation found. Thanks to Tanya!
% SZS status Theorem for problem1
% SZS output start Proof for problem1
1. p(a) [input]
2. ? [X0] : p(X0) [input]
3. ~? [X0] : p(X0) [negated conjecture 2]
4. ! [X0] : ~p(X0) [ennf transformation 3]
5. p(a) [cnf transformation 1]
6. ~p(X0) [cnf transformation 4]
7. $false [resolution 5,6]
% SZS output end Proof for problem1
% ------------------------------
% Version: Vampire 4.8 (commit 8d999c135 on 2023-07-12 16:43:10 +0000)
% Linked with Z3 4.9.1.0 6ed071b44407cf6623b8d3c0dceb2a8fb7040cee z3-4.8.4-6427-g6ed071b44
% Termination reason: Refutation

% Memory used [KB]: 4989
% Time elapsed: 0.074 s
% ------------------------------
% ------------------------------

I don't see the "answer" a you mentioned in the output. What I see based on SZSOntology is:

The last point is confusing because if a can be used to satisfy the model (so that all axioms are true), why does Vampire report $false (= contradiction)? Or is that a proof by contradiction? In a way, that its a theorem and it provides me a solution by using a proof by contradiction?

The proof by contradiction is based on the following quote from SZSOntology:

Theorem (THM): All models of Ax are models of C.

  • F is valid, and C is a theorem of Ax.
  • Possible dataforms are Proofs of C from Ax.
MichaelRawson commented 3 months ago

I don't see the "answer" a you mentioned in the output.

Right, this feature has some overhead so it's not on by default. Pass -qa on if you want this - but I emphasize again that if you only want yes/no "is it a theorem?" answers you don't need question-answering.

the input is a valid theorem

Correct.

it provides me a proof

Yes, you are fortunate. ;-) You can suppress this with -p off.

it states $false [resolution 5,6]

All Vampire proofs are by refutation: note that "negated conjecture" earlier in the proof.


You mention a true/false API for Vampire. The space of possible results is a bit richer than true/false (which is why we have the SZS ontology), but an API would be nice - there are serious implementation challenges here, but "at some point" this could happen in principle.

In the meantime, the Z3 SMT solver may work more in the way that you expect ("satisfiable" or "unsatisfiable"), and it has an API.

k00ni commented 3 months ago

Thanks again for the quick answer. My focus on true/false as an answer is because in my use case I use another system (PHP) to interact with Vampire. This system has to able to interpret the result after each Vampire call.

but I emphasize again that if you only want yes/no "is it a theorem?" answers you don't need question-answering.

My mistake, I meant it in the way for a given set of axioms is the given conjecture "correct" (true, holds, ...)? To be honest, I don't even care if its a theorem or not. My sole focus is a (almost) binary answer. My point with "question-answering" was solely because I read it in a context of LLMs and assumed its related (because of the question type in TPTP).

On the other hand its very interesting to use Vampire to explain why a certain set of axioms is not satisfiable or to even show a proof. Also, I will have a look into Z3 SMT.

All Vampire proofs are by refutation: note that "negated conjecture" earlier in the proof.

Does this mean, if I read negated conjecture and find a $false [resolution ...] in the output, it means Vampire thinks the model is satisfiable? And it reports "unsatisfiable" if my conjecture leads to a contradiction?

MichaelRawson commented 3 months ago

I think there is some confusion here. The way Vampire works is that it loads the axioms Ax and a conjecture C, negates the conjecture to get ~C, then forms the conjunction Ax /\ ~C. If this conjunction implies $false, then C is a consequence, a theorem of Ax.

"correct", "true", "holds" can mean different things depending on the context, but here I'd expect them to be synonymous with "C is a theorem of Ax".

To reiterate, and simplifying only very slightly: if Vampire says SZS status Theorem, the conjecture follows from the axioms. If it says something else, it doesn't.