vprover / vampire

The Vampire Theorem Prover
https://vprover.github.io/
Other
295 stars 51 forks source link

How to restrict possible answers #367

Closed logicReasoner closed 2 years ago

logicReasoner commented 2 years ago

Hello, this is not an issue with vampire itself but rather my lack of understanding on how to express the following so please excuse my ignorance :) I have the following axioms and question at the end:

% there exists an object X that satisfies both predicates predA and predB where predB has objectA as a second arg in this case
fof(a1,axiom,
    ( ? [X] :
        ( predA(X)
        & predB(X,objectA) ) )).

% objectA satisfies predA
fof(a2_1,axiom,
    ( predA(objectA) )).

% objectB satisfies predA
fof(a2_2,axiom,
    ( predA(objectB) )).

% objectC satisfies predA
fof(a2_3,axiom,
    ( predA(objectC) )).

% for all X, if predA is satisfied for X, then X must be one of `objectA,  objectB, objectC` and can NOT be anything else
fof(a3,axiom,
    ( ! [X] :
        ( predA(X)
       => ( X = objectA
          | X = objectB
          | X = objectC ) ) )).

% Question, find all objects that satisfy predB with second arg objectA
fof(question,question,(?[X]:( predB(X,objectA) )) ).

And I get the answer:

% SZS answers Tuple [[sK1]|_]

As the result of skolemization? My question is how to restrict the possible answers to just objectA, objectB, objectC. This is the idea of axiom a3 but it doesn't seem to do the job since the introduced symbol sK1 doesn't get unified with either of those. Basically, I'd like to have only my specified objects (objectA, objectB, objectC) and no sK* in the answer. Another question is how do I get all possible answers and not just the first one? Let me know if the above is not clear. Thanks! :)

P.S. With the more simple question fof(question,question,(?[X]:( predA(X) )) ). (I can only get vampire to output a single answer and not all possible ones), I get Tuple [[objectC]|_] (by chance?) in this case but I am pretty sure that sK1 is equally as likely to appear as an answer

MichaelRawson commented 2 years ago

Hi - I'm not the question-answering expert here, perhaps @selig or @hzzv can help when they get time. But the problem with the more-complex query ?[X]: predB(X,objectA) is that you only say (via a1) predB(sK1): not e.g. predB(objectA). I'm not sure what your intention here is - maybe you can expand a bit on what you're trying to achieve?

The simpler query ?[X]: predA(X) is more interesting. You are correct that sK1 is a reasonable answer to this query, as are objects A-C. This is because Vampire doesn't "know" (i.e. the input doesn't say otherwise) that objects A-C and sK1 are distinct. For example, it could be that objectA = objectB = objectC = sK1.

For this reason, if you query ?[X]: (predA(X) & X != objectA), Vampire gives no answers at all. This may be surprising! This can be fixed by adding the following axiom:

fof(distinct, axiom, objectA != objectB & objectA != objectC & objectB != objectC).

i.e. that objects A-C are all distinct from each other, if this is your intention. TPTP/Vampire has direct support for this kind of reasoning: if you write "object", "object" != "other object" is implicit.

logicReasoner commented 2 years ago

Hey @MichaelRawson . Thanks for your answer! In short, my intention is to restrict the possible VALID answers for X in fof(question,question,(?[X]:( predB(X,objectA) )) ). to only specifically mentioned before constant symbols (those are: objectA, objectB and objectC), i.e. something that would be called Domain Closure: The resulting model contains no more domain elements than those named by the constant symbols (the only named constants in the example above are: objectA, objectB and objectC). Currently when I get sK1 as an answer, I have no way of knowing whether it's one of objectA, objectB and objectC or some other constant symbol entirely. (For example, we could have an unrelated constant symbol objectD where fof(a5_1,axiom,( predD(objectD) )).. In that case objectD is NOT a valid answer to our query (although it's one of the named constants) since it's unrelated to any of the original predA axioms). So we have to either somehow omit sK1 from the answer or make it explicit that sK1 = one of objectA, objectB and objectC. That is the purpose of axiom a3 above but it seems to NOT work at all (I guess due to a limitation of the logic itself not Vampire?). Is there an alternative logic / extension supported by Vampire that would allow for expressing such constraints (Domain closure), i.e. only return valid answers that are named constants? Something like Database semantics in the context of say RDBMS.

Regarding your suggestion, in a later iteration, I've added:

fof(a4_1,axiom, ( objectA != objectB )).
fof(a4_2,axiom, ( objectA != objectC )).
fof(a4_3,axiom, ( objectB != objectC )).

which is equivalent to your more elegant one-liner fof(distinct, axiom, objectA != objectB & objectA != objectC & objectB != objectC). but still doesn't affect the outcome.

BTW, is this the only way (explicit inequalities) to turn on Unique name assumption (assume that each named constant refers to a distinct object)?

MichaelRawson commented 2 years ago

OK, thank you for explaining! I'll reply somewhat non-linearly, easiest to hardest.

BTW, is this the only way (explicit inequalities) to turn on Unique name assumption (assume that each named constant refers to a distinct object)?

No, Vampire actually has support for this! If you write e.g. "objectA" instead of objectA, all "quoted objects" will be assumed distinct from each other. To use your example

% there exists an object X that satisfies both predicates predA and predB where predB has "objectA" as a second arg in this case
fof(a1,axiom,
    ( ? [X] :
        ( predA(X)
        & predB(X,"objectA") ) )).

% "objectA" satisfies predA
fof(a2_1,axiom,
    ( predA("objectA") )).

% "objectB" satisfies predA
fof(a2_2,axiom,
    ( predA("objectB") )).

% "objectC" satisfies predA
fof(a2_3,axiom,
    ( predA("objectC") )).

% for all X, if predA is satisfied for X, then X must be one of `"objectA",  "objectB", "objectC"` and can NOT be anything else
fof(a3,axiom,
    ( ! [X] :
        ( predA(X)
       => ( X = "objectA"
          | X = "objectB"
          | X = "objectC" ) ) )).

% Question, find all objects that satisfy predB with second arg "objectA"
fof(question,question,(?[X]:( predB(X,"objectA") )) ).

I guess due to a limitation of the logic itself not Vampire?

Yes. I'll get into it later, but I believe Vampire is working correctly here.

Is there an alternative logic / extension supported by Vampire that would allow for expressing such constraints (Domain closure), i.e. only return valid answers that are named constants?

Not to my knowledge.

make it explicit that sK1 = one of objectA, objectB and objectC

This is the crux of the issue in my view. Vampire can deduce this (it just doesn't tell you in the proof because it's not used), but we can force it by e.g.

% reasonable(X) forces X to be one of the domain objects
fof(reasonable_answers, axiom, ![X]: ((X = "objectA" | X = "objectB" | X = "objectC") => reasonable(X))).

fof(question,question,(?[X]:( reasonable(X) & predB(X,"objectA") )) ).

Which is similar to your axiom a3 but phrased in such a way that Vampire is forced to discharge the assumptions. Unfortunately, we get this output:

% SZS status Theorem for test
% SZS answers Tuple [[]|_] for test
% SZS output start Proof for test
1. ? [X0] : (predB(X0,"objectA") & predA(X0)) [input]
5. ! [X0] : (predA(X0) => ("objectC" = X0 | "objectB" = X0 | "objectA" = X0)) [input]
6. ! [X0] : (("objectC" = X0 | "objectB" = X0 | "objectA" = X0) => reasonable(X0)) [input]
7. ? [X0] : (predB(X0,"objectA") & reasonable(X0)) [input]
8. ~? [X0] : (predB(X0,"objectA") & reasonable(X0)) [negated conjecture 7]
10. ~? [X0] : (predB(X0,"objectA") & reasonable(X0) & ans0(X0)) [answer literal 8]
11. ! [X0] : (("objectC" = X0 | "objectB" = X0 | "objectA" = X0) | ~predA(X0)) [ennf transformation 5]
12. ! [X0] : ("objectC" = X0 | "objectB" = X0 | "objectA" = X0 | ~predA(X0)) [flattening 11]
13. ! [X0] : (reasonable(X0) | ("objectC" != X0 & "objectB" != X0 & "objectA" != X0)) [ennf transformation 6]
14. ! [X0] : (~predB(X0,"objectA") | ~reasonable(X0) | ~ans0(X0)) [ennf transformation 10]
15. ? [X0] : (predB(X0,"objectA") & predA(X0)) => (predB(sK1,"objectA") & predA(sK1)) [choice axiom]
16. predB(sK1,"objectA") & predA(sK1) [skolemisation 1,15]
20. predA(sK1) [cnf transformation 16]
21. predB(sK1,"objectA") [cnf transformation 16]
25. ~predA(X0) | "objectB" = X0 | "objectA" = X0 | "objectC" = X0 [cnf transformation 12]
26. reasonable(X0) | "objectA" != X0 [cnf transformation 13]
27. reasonable(X0) | "objectB" != X0 [cnf transformation 13]
28. reasonable(X0) | "objectC" != X0 [cnf transformation 13]
29. ~predB(X0,"objectA") | ~reasonable(X0) | ~ans0(X0) [cnf transformation 14]
30. reasonable("objectC") [equality resolution 28]
31. reasonable("objectB") [equality resolution 27]
32. reasonable("objectA") [equality resolution 26]
33. ~reasonable(sK1) | ~ans0(sK1) [resolution 29,21]
37. "objectC" = sK1 | "objectA" = sK1 | "objectB" = sK1 [resolution 25,20]
38. ~reasonable("objectC") | ~ans0("objectC") | "objectA" = sK1 | "objectB" = sK1 [superposition 33,37]
43. "objectB" = sK1 | "objectA" = sK1 | ~ans0("objectC") [subsumption resolution 38,30]
46. ~reasonable("objectB") | ~ans0("objectB") | "objectA" = sK1 | ~ans0("objectC") [superposition 33,43]
50. "objectA" = sK1 | ~ans0("objectB") | ~ans0("objectC") [subsumption resolution 46,31]
55. ~reasonable("objectA") | ~ans0("objectA") | ~ans0("objectB") | ~ans0("objectC") [superposition 33,50]
59. ~ans0("objectA") | ~ans0("objectB") | ~ans0("objectC") [subsumption resolution 55,32]
60. ans0(X0) [answer literal]
61. $false [unit resulting resolution 60,60,60,59]
% SZS output end Proof for test

Note that the query is true, but there is no fixed answer. This is because Vampire can only deduce that the answer is one of objectA, objectB, or objectC (see deduction 37, then 59 above), not which of them it is. If on the other hand we supply

fof(b1, axiom, predB("objectB", "objectA")).

Then we get an answer. Does this help?

somehow omit sK1 from the answer

This is tricky. The above notwithstanding, I can still imagine a situation where Vampire deduces e.g. sK128 = "objectZ", but returns sK128 instead of "objectZ". There might be some ways around this, I'll ask around internally if this becomes a problem for you.

Btw, your current a2_1, a2_2, a2_3, a3 could be written more concisely if you wish as

fof(a_def, axiom, ![X]: (predA(X) <=> (X = "objectA" | X = "objectB" | X = "objectC"))).
logicReasoner commented 2 years ago

The above information was indeed very helpful! :+1: I am starting to understand more and more things now. Thanks once again! :)

Note that the query is true, but there is no fixed answer. This is because Vampire can only deduce that the answer is one of objectA, objectB, or objectC (see deduction 37, then 59 above), not which of them it is.

This is fine, as long as I get (a list of) all valid answers (constant symbols). This was my original intention anyway. Is there a way to instruct vampire to return ALL valid answers, not just one of them? Then perhaps, I can filter out all sk*.

This is tricky. The above notwithstanding, I can still imagine a situation where Vampire deduces e.g. sK128 = "objectZ", but returns sK128 instead of "objectZ". There might be some ways around this, I'll ask around internally if this becomes a problem for you.

Yes, please let me know if you find any ways around this.

BTW, I've shortened some and further commented other parts of the problem based on your suggestions above:

% there exists some object X from the Domain(one of) that satisfies both predicates predA and predB
fof(a1,axiom,
    ( ? [X]: ( predA(X) & predB(X,"objectA") ) )).

% "objectA" satisfies predA, "objectB" satisfies predA, etc.
% No longer needed. Covered by `a3` below
%fof(a2_1,axiom,( predA("objectA") )).
%fof(a2_2,axiom,( predA("objectB") )).
%fof(a2_3,axiom,( predA("objectC") )).

% for all X, if predA is satisfied for X, then X MUST BE (a constraint of sorts) one of `objectA, objectB, objectC` and can NOT be anything else
% Now also covers a2_1,... a2_3
fof(a3,axiom,
    ( ! [X]:
        ( predA(X)
       <=> ( X = "objectA"
          | X = "objectB"
          | X = "objectC" ) ) )).

% some unrelated to the above axiom - another "objectD" and predD (just so that we have at least one other constant symbol which is NOT a VALID solution to the question below)
fof(a5_1,axiom,( predD("objectD") )).

% should be no longer needed since we use quotes around each of objectA, objectB, etc.
%fof(distinct, axiom, "objectA" != "objectB" & "objectA" != "objectC" & "objectB" != "objectC").

%%%%%%%%
% Question, find all objects from our domain that satisfy predB with second arg "objectA"
fof(question,question,(?[X]: ( predB(X,"objectA") )) ).
%fof(question,question,(?[X]: ( predA(X) )) ). % just predA as a test
quickbeam123 commented 2 years ago

Note that the query is true, but there is no fixed answer. This is because Vampire can only deduce that the answer is one of objectA, objectB, or objectC (see deduction 37, then 59 above), not which of them it is.

This is fine, as long as I get (a list of) all valid answers (constant symbols). This was my original intention anyway. Is there a way to instruct vampire to return ALL valid answers, not just one of them? Then perhaps, I can filter out all sk*.

Before @MichaelRawson finds time to answer in full, let me just jump in with a quick remark. There is a difference between: "Your answer is one of A, B, or C, but I don't know which." and "A, B, and C all work as your answer.". In the letter case, Vampire is happy to enumerate the three. The former case is the one Michael had in mind.

logicReasoner commented 2 years ago

Before @MichaelRawson finds time to answer in full, let me just jump in with a quick remark. There is a difference between: "Your answer is one of A, B, or C, but I don't know which." and "A, B, and C all work as your answer.". In the letter case, Vampire is happy to enumerate the three. The former case is the one Michael had in mind.

@quickbeam123 Yeah, in the former case, we can't expect Vampire to return anything. Just status Theorem and an empty list / string for answers makes perfect sense. I am interested in the latter case where "A, B, and C all work as your answer." as in the above contrived example but how can we get Vampire to enumerate all of them? I call it using: ./vampire_z3_rel_static_master_5921 --avatar off -qa answer_literal --mode casc and get: % SZS answers Tuple [[sK1]|_] for experiment and no other answers. Is --avatar off -qa answer_literal the wrong combination of arguments here?

quickbeam123 commented 2 years ago

Hmm, the problem seems to be that each distinct answer from Vampire essentially comes from a distinct proof. In this case, it becomes a problem that the superposition calculus in Vampire is smart and can come up with a proof using sK1. Now knowing that any other proof would have to be more complicated than that, artificially employing the extra assumption such as

fof(a3,axiom,
    ( ! [X]:
        ( predA(X)
       <=> ( X = "objectA"
          | X = "objectB"
          | X = "objectC" ) ) )).

it seems Vampire stops trying. Maybe @selig could shed more light on how subsequent proofs (=answers) are enumerated.

logicReasoner commented 2 years ago

Yeah, it makes sense to first use the most optimal (fastest) strategy in order to find out if there's a proof at all. But once a proof is found, if the query type is question, then perhaps it makes sense to do a more exhaustive search (involving a different strategy?) for all valid answers?

MichaelRawson commented 2 years ago

@quickbeam123 is right on the money, as usual!

I am interested in the latter case where "A, B, and C all work as your answer." as in the above contrived example

I am reasonably certain that this is not the case for the example (@quickbeam123 - can you correct me here?). sK1 is one of A/B/C but not all of them. If on the other hand we had

fof(a_a, axiom, predB("objectA", "objectA")).
fof(a_a, axiom, predB("objectB", "objectA")).
fof(a_a, axiom, predB("objectC", "objectA")).

then we could get answers A, B, C. As it is the answer would have to be some kind of hypothetical disjunction-term A-or-B-or-C. As I recall people were looking into this kind of thing a bit, but by the sounds of things you actually want the former case, especially given the database-ish setting. Perhaps you want something like an intersection?

fof(intersection, axiom, ![X]: (intersection(X) <=> (predA(X) & predB(X, "objectA")))).
fof(question, question, ?[X]: intersection(X)).

but how can we get Vampire to enumerate all of them?

Again, I think others were looking into this solving question properly. But a very-simple solution is to add extra constraints to the conjecture iteratively, such that they block the answers computed so far.

fof(question,question, ?[X]: (predA(X))). % get answer "objectA"
fof(question,question, ?[X]: (predA(X) & X != "objectA")). % "objectB"
fof(question,question, ?[X]: (predA(X) & X != "objectA" & X != "objectB")). % "objectC"
fof(question,question, ?[X]: (predA(X) & X != "objectA" & X != "objectB" & X != "objectC")).  % no dice

This has some obvious scalability problems but does seem to work for your purposes.

./vampire_z3_rel_static_master_5921 --avatar off -qa answer_literal --mode casc

--mode casc is overkill for now. -av off -qa answer_literal suffices. You might well need a strategy schedule later if you are trying harder or more-varied problems.

logicReasoner commented 2 years ago

I am reasonably certain that this is not the case for the example (@quickbeam123 - can you correct me here?). sK1 is one of A/B/C but not all of them. If on the other hand we had

Yes, you're right, @MichaelRawson . My mistake. The original (much more complex) problem which I was trying to solve had axiom1 as given, that "for one of the objects in the domain, it is true that predA(X) & predB(X,"objectA")". Later, while investigating why I was getting the skolem symbol sK1 instead of one of objectA, objectB and objectC , I simplified the problem to the contrived example posted above while still getting sK1 as an answer, but now the semantics seem to have changed since some of the other axioms are missing. So yes, for the given example above, "Your answer is one of A, B, or C, but I don't know which." is the valid response. Something like "objectA"|"objectB"|"objectC".

I will work a bit more on the contrived example above with the goal of adding any relevant axioms so that it is closer to the original

selig commented 2 years ago

I should read through the full conversation before responding but let me first add

giles$ ./vampire_rel_qa --question_answering answer_literal -av off prob
% Running in auto input_syntax mode. Trying TPTP
% SZS answers Tuple [([objectB]|[objectC]|[objectA]),([objectC]|[objectB]|[objectA]),[sK1]|_] for prob

There is an experimental branch qa where I've added a few things to question answering mode, including the fact that it doesn't 'stop' on the first answer. But it's also not currently clever enough not to return the same answer more than once. That would be quick to fix, perhaps I should!

Also, Vampire is definitely not clever enough to work out there are no more answers... but it definitely could do that (by detecting domain closure axioms as you mention).

Now I'll read the conversation ;)

logicReasoner commented 2 years ago

% SZS answers Tuple [([objectB]|[objectC]|[objectA]),([objectC]|[objectB]|[objectA]),[sK1]|_] for prob

Wow! That was exactly I was looking for from the beginning :D I wasn't aware of the existence of the qa branch :P Before diving into the adventure of compiling Vampire from that branch (using the release binary until now was way too easy :D ), should I try to merge from master (This branch is [4 commits ahead], [177 commits behind] master.)?

Also, Vampire is definitely not clever enough to work out there are no more answers... but it definitely could do that (by detecting domain closure axioms as you mention).

That would be great as long as it's not a very difficult problem!

selig commented 2 years ago

Right, instead of reading through and making sensible comments I felt like coding so I hacked something. The qa branch is now up to date with master and does this

giles$ ./vampire_dbg_qa --question_answering answer_literal -av off -qa_io on prob
% Running in auto input_syntax mode. Trying TPTP
% SZS answers Tuple [([objectC]|[objectB]|[objectA])|_] for prob

The new qa_io option filters out any answers that contain symbols that weren't in the input and answer extraction now skips repeated outputs (note that as implemented this can lead to non-termination for the same reasons that asking for a certain number of answers with questionCount will).

I will eventually read through the above thread to comment on the other dimensions but one comment is that knowing that you have all answers is not something that's possible generally but it sounds like you have some restrictions on the input that might make it possible.

logicReasoner commented 2 years ago

@selig Awesome! Thank you very much for putting in the effort. I will compile from the qa branch tomorrow and provide any feedback! :)

logicReasoner commented 2 years ago

@selig After a long fight with z3 on how to build a static library, I finally got vampire_z3_rel_static_qa_6176 compiled and working!

I got the expected answer: % SZS answers Tuple [(["objectC"]|["objectB"]|["objectA"])|_] for prob! :)

Also, I have no idea what kinds of optimizations you guys have implemented since version 4.6.1 (master_5921), but it's also amazingly fast (10x faster) on the contrived problem above (perhaps because I compiled it using clang on my debian system?): vampire_z3_rel_static_qa_6176 -av off -qa answer_literal -qa_io on --input_syntax tptp -t 30s: real 0m0.003s user 0m0.000s sys 0m0.003s

versus vampire_z3_rel_static_master_5921 --avatar off -qa answer_literal --input_syntax tptp -t 30s: real 0m0.031s user 0m0.027s sys 0m0.004s

Another question: is it worth enabling --mode casc --cores 3 so that I can use more cores (when available on the system) for bigger problems? Obviously, in the case of the simple contrived problem above, it is overkill and makes things slower. But for problems with thousands of axioms, things may be different, right?

I will continue experimenting with bigger problems and get back to you if I notice anything weird or unexpected.

PS With vampire_z3_rel_static_qa_6176, I get a funny warning (how can I resolve it or is it safe to ignore?): perf_event_open failed (instruction limiting will be disabled): Permission denied

quickbeam123 commented 2 years ago

It is safe to ignore perf_event_open failed. I will look at ways to make it not bother you in the next version.

logicReasoner commented 2 years ago

So while trying to debug a more complex problem, I noticed the following phenomenon: if both axioms a_102_2_2 and a_102_3 are added, vampire gets into an infinite loop trying to find answers matching the question below. Note that either axiom works fine on its own but both together cause the loop.

% a comparative relation
fof(a_102_2_1,axiom,(![X,Y]: ((smallerThan(X,Y)) => ~smallerThan(Y,X)))).
% if two numbers are NOT equal and not(X < Y), then Y < X
fof(a_102_2_2,axiom,(![X,Y]: ((~smallerThan(X,Y) & X !=Y) => smallerThan(Y,X)))).

% transitive rule
fof(a_102_3,axiom,(![X,Y,Z]: ((smallerThan(X,Y) & smallerThan(Y,Z)) => smallerThan(X,Z)))).

fof(a_11_311,axiom, smallerThan("A","B")).
fof(a_11_312,axiom, smallerThan("B","C")).

%%%
fof(question,question,(?[X,Y]:(smallerThan(X,Y)))).

So how can I avoid that issue? Is it better to use a helper predicate like not_smallerThan instead of using ~smallerThan or is there an alternative way to express such relations?

IIRC, In Prolog, one could use tabling or the cut operator ( ! ) to avoid entering such infinite loops. Is there something similar for ATPs like Vampire?

selig commented 2 years ago

Sorry, I should have mentioned this. In the new branch, there's a question_count option

giles$ ./vampire_dbg_qa -explain qc
--question_count (-qc)
    The max number of answers you want, 0 means as many as possible, which 
    could lead to non-termination.
    default: 0

The problem here is that there is no finite saturation of these rules so Vampire is doing the right thing by continuing to generate more and more consequences.

If we ask for three answers we get

giles$ ./vampire_dbg_qa --question_answering answer_literal -av off -qa_io off prob -qc 3
% Running in auto input_syntax mode. Trying TPTP
% SZS answers Tuple [["A","C"],["B","C"],["A","B"]|_] for prob

which looks good but then if we ask for more we start to see these partial solutions that Vampire is building up

giles$ ./vampire_dbg_qa --question_answering answer_literal -av off -qa_io off prob -qc 4
% Running in auto input_syntax mode. Trying TPTP
% SZS answers Tuple [([X0,"B"]|["A",X0]|[X0,"B"]),["A","C"],["B","C"],["A","B"]|_] for prob

notice that the solution ([X0,"B"]|["A",X0]|[X0,"B"]) says either X0 < B or A < B or B < X0 for any X0, which may or may not be useful.

Playing with question count I found up to 8 unique answers but no more. As before, there's no current way to know if these are all of them.

A feature I could and should hack in is for Vampire to report all the answers found so far on timeout i.e. if you ask for 9 and there are only 8 you still get those 8 when it times out. I'll do that... 'soon'.

logicReasoner commented 2 years ago

--question_count (-qc)

Yes, that option seems to be the missing piece of the puzzle :)

notice that the solution ([X0,"B"]|["A",X0]|[X0,"B"]) says either X0 < B or A < B or B < X0 for any X0, which may or may not be useful.

Not very useful in practice since while it's technically correct, it's also way too general. Say, if I wanted to avoid any partial solutions, then would a viable strategy be to start with a low value, e.g. -qc 2 and if as a result, all answers contain ONLY defined constant symbols like A, B, C, then to consequently increase the max number until the first partial solution pops up? Then we would know that -qc X is what we need (where X is the value before the first partial solution pops up == 3 in the example above). Something like iterative deepening? :)

Playing with question count I found up to 8 unique answers but no more. As before, there's no current way to know if these are all of them.

This is also where the total computation time starts to rise by a lot, i.e. -qc 8 is still really fast while -qc 9 takes x100 longer. Just an idea: is there some notable (significant) change in the internal state of the model once the first 8 answers are found and then Vampire starts looking for the 9th (e.g. some boolean flag turning from true to false or something like that that would mean that Vampire has detected that there is no finite saturation)? If you could somehow detect when that moment occurs (by using some heuristics or whatever), then (for the qa branch at least) you could also add a new option to Vampire something like --stop_when_no_finite_saturation which would be switched off by default. But if set to on, then in the above use case, that option would make Vampire stop once the 8th answer is found?

A feature I could and should hack in is for Vampire to report all the answers found so far on timeout i.e. if you ask for 9 and there are only 8 you still get those 8 when it times out. I'll do that... 'soon'.

Yes, that would be an alternative to my suggested idea above but at the cost of the additional time for the timeout to occur.

MichaelRawson commented 2 years ago

I'm now completely out of my depth on the question-answering stuff, but you seem to have found expert assistance. :-)

Also, I have no idea what kinds of optimizations you guys have implemented since version 4.6.1 (master_5921), but it's also amazingly fast (10x faster) on the contrived problem above

No idea! I keep an eye on perf-related stuff and the core of Vampire hasn't gotten much faster. At least for proving theorems, rather than answering questions. Maybe @selig's question-answering update fixed something?

Another question: is it worth enabling --mode casc --cores 3 so that I can use more cores (when available on the system) for bigger problems? Obviously, in the case of the simple contrived problem above, it is overkill and makes things slower. But for problems with thousands of axioms, things may be different, right?

Sure, it could be. But the main aim of --mode casc is not really to use more cores (although it can), and definitely not to solve bigger problems. It just iterates through a (mostly) pre-defined list of combinations-of-options that heuristically solves a lot of different kinds of problems. This is useful when the user doesn't know which options are best for their particular problem. If you have mostly-similar problems, once you have a configuration that works I'd spend some time tweaking options to see what works, what doesn't, and what has no effect. Let us know if you need help with that.

logicReasoner commented 2 years ago

Sure, it could be. But the main aim of --mode casc is not really to use more cores (although it can), and definitely not to solve bigger problems.

Gotcha! I only add --mode casc because otherwise, I am not allowed to specify more than one core, e.g. --cores 3 to be used. So If I understand correctly, even if I specify --mode casc --cores 3, then each solving strategy (where a strategy is just a pre-defined list of combinations-of-options) will only use a single core core at a time, but --cores 3 would allow for 3 distinct solving strategies to be run in parallel each of which would use 1 core?

MichaelRawson commented 2 years ago

Correct! Making use of multiple cores inside one strategy is not really a thing yet in first-order theorem proving, and I think it's unlikely in the near future. The way Vampire organises proof search is inherently sequential, at least as far as I can see.

selig commented 2 years ago

We can add any 'filter' on answers that we want as we did with filtering out introduced symbols. For example, filter out any answers containing variables.

Vampire doesn't detect that there's no finite saturation and the undecidability of first-order logic tells us that we can't do that in general. Perhaps there could be heuristics to detect lack of saturation in particular cases but we've not invested in them as our main focus with saturation is on finding proofs.

I think the best thing here is to define precisely what kinds of answers are acceptable. We filter out the rest and then we rely on a timeout mechanism. If a coarse-grained timeout mechanism isn't very attractive then we could introduce a 'time since last answer' i.e. stop looking when it's been more than 1 second since you found an answer.

logicReasoner commented 2 years ago

I think the best thing here is to define precisely what kinds of answers are acceptable. We filter out the rest and then we rely on a timeout mechanism.

This is worth thinking more about. If variables start appearing as part of solutions (e.g. [([X0,"B"]|["A",X0]|[X0,"B"]),...|_]), then this means that the original problem is underspecified, right? Perhaps, I need to add more axioms (rules) in order to in the end, get solutions with just constant symbols (A,B, C) in them, i.e. only the following 3: ["A","C"],["B","C"],["A","B"]? Any tips for the above example problem?

we could introduce a 'time since last answer' i.e. stop looking when it's been more than 1 second since you found an answer.

This sounds like the best compromise so far :+1:

k00ni commented 5 months ago

@logicReasoner I know this issue is dated, but could you please post a complete example of your TPTP-code as well as the Vampire-command? It would be very appreciated.

quickbeam123 commented 3 months ago

There has been an recent update in question answering capabilities of Vampire; see https://easychair.org/smart-slide/slide/dhkS

The commands corresponding to the questions discussed here are:

$ ./vampire --input_syntax tptp -qa plain questionFromGithub.p 
% Refutation found. Thanks to Tanya!
% SZS status Theorem for questionFromGithub
% SZS answers Tuple [[X->sK1]|_] for questionFromGithub
%    sK1 introduced for X0 in 1. ? [X0] : (predB(X0,"objectA") & predA(X0)) [input(axiom)]

Note that the skolems get an explanation.

Unwanted answer can be filtered out (but must be filtered out explicitly):

$ ./vampire --input_syntax tptp -qa plain questionFromGithub.p -qaat "ans0(sK1)"
% Refutation found. Thanks to Tanya!
% SZS status Theorem for questionFromGithub
% SZS answers Tuple [([X->"objectA"]|[X->"objectB"]|[X->"objectC"])|_] for questionFromGithub
% SZS output start Proof for questionFromGithub

As discussed, the example here (using the version from May 20, 2022) leads to a disjunctive answer if the skolem is rejected by the user.

Concerning multiple answers, looking at the other example (from May 24, 2022):

./vampire --input_syntax tptp -qa plain questionFromGithub2.p                 
% Refutation found. Thanks to Tanya!
% SZS status Theorem for questionFromGithub2
% SZS answers Tuple [[X->"B",Y->"C"]|_] for questionFromGithub2
...
./vampire --input_syntax tptp -qa plain questionFromGithub2.p -qaat 'ans0("B","C")'
% Refutation found. Thanks to Tanya!
% SZS status Theorem for questionFromGithub2
% SZS answers Tuple [[X->"A",Y->"B"]|_] for questionFromGithub2
...
./vampire --input_syntax tptp -qa plain questionFromGithub2.p -qaat 'ans0("B","C")|ans0("A","B")' 
% Refutation found. Thanks to Tanya!
% SZS status Theorem for questionFromGithub2
% SZS answers Tuple [[X->"A",Y->"C"]|_] for questionFromGithub2
...
./vampire --input_syntax tptp -qa plain questionFromGithub2.p -qaat 'ans0("B","C")|ans0("A","B")|ans0("A","C")'
% Refutation found. Thanks to Tanya!
% SZS status Theorem for questionFromGithub2
% SZS answers Tuple [∀X0.([X->"A",Y->X0]|[X->X0,Y->"B"])|_] for questionFromGithub2
...

Note that universal variables in answers are explicitly quantified to denote the scope.