Open Some-random opened 1 year ago
Looks like the error mostly occurs when there are multiple potential reasoning paths for the problem (although only one of them would be correct). There are two more examples from dev set to back this assumption:
ProofWriter_RelNoneg-OWA-D5-861_Q3
The cat needs the dog
The dog is kind
The dog needs the rabbit
The rabbit eats the dog
The rabbit is green
The tiger chases the rabbit
The tiger is big
If something is nice and it eats the tiger then it needs the dog
If something chases the cat then it eats the cat
If the dog chases the tiger then the tiger needs the rabbit
If the rabbit chases the tiger then the rabbit chases the cat
If something chases the cat then the cat is kind
If something eats the dog then it chases the cat
If something is rough then it eats the dog
If something is kind then it is rough
If something eats the rabbit and the rabbit is big then it is kind
Based on the above information, is the following statement true, false, or unknown? The rabbit chases the cat.
Predicates:
Needs($x, $y, bool) ::: Does x need y?
Kind($x, bool) ::: Is x kind?
Eats($x, $y, bool) ::: Does x eat y?
Green($x, bool) ::: Is x green?
Chases($x, $y, bool) ::: Does x chase y?
Big($x, bool) ::: Is x big?
Nice($x, bool) ::: Is x nice?
Rough($x, bool) ::: Is x rough?
Facts:
Needs(Cat, Dog, True) ::: The cat needs the dog.
Kind(Dog, True) ::: The dog is kind.
Needs(Dog, Rabbit, True) ::: The dog needs the rabbit.
Eats(Rabbit, Dog, True) ::: The rabbit eats the dog.
Green(Rabbit, True) ::: The rabbit is green.
Chases(Tiger, Rabbit, True) ::: The tiger chases the rabbit.
Big(Tiger, True) ::: The tiger is big.
Rules:
Nice($x, True) && Eats($x, Tiger, True) >>> Needs($x, Dog, True) ::: If something is nice and it eats the tiger then it needs the dog.
Chases($x, Cat, True) >>> Eats($x, Cat, True) ::: If something chases the cat then it eats the cat.
Chases(Dog, Tiger, True) >>> Needs(Tiger, Rabbit, True) ::: If the dog chases the tiger then the tiger needs the rabbit.
Chases(Rabbit, Tiger, True) >>> Chases(Rabbit, Cat, True) ::: If the rabbit chases the tiger then the rabbit chases the cat.
Chases($x, Cat, True) >>> Kind(Cat, True) ::: If something chases the cat then the cat is kind.
Eats($x, Dog, True) >>> Chases($x, Cat, True) ::: If something eats the dog then it chases the cat.
Rough($x, True) >>> Eats($x, Dog, True) ::: If something is rough then it eats the dog.
Kind($x, True) >>> Rough($x, True) ::: If something is kind then it is rough.
Eats($x, Rabbit, True) && Big(Rabbit, True) >>> Kind($x, True) ::: If something eats the rabbit and the rabbit is big then it is kind.
Query:
Chases(Rabbit, Cat, True) ::: The rabbit chases the cat.
ProofWriter_RelNoneg-OWA-D5-745_Q5:
The bear is round
The bear likes the cow
The bear visits the lion
The cow is big
The cow is rough
The lion needs the bear
The mouse likes the lion
If someone visits the lion then they are round
If someone needs the bear then they need the lion
If someone needs the lion then they visit the lion
If someone visits the lion and the lion is round then they like the lion
If someone is rough then they need the lion
If someone likes the lion and they need the lion then the lion visits the mouse
If someone is rough then they visit the lion
If someone likes the lion and the lion is big then they are kind
Based on the above information, is the following statement true, false, or unknown? The lion visits the lion.
Predicates:
Round($x, bool) ::: Is x round?
Likes($x, $y, bool) ::: Does x like y?
Visits($x, $y, bool) ::: Does x visit y?
Big($x, bool) ::: Is x big?
Rough($x, bool) ::: Is x rough?
Needs($x, $y, bool) ::: Does x need y?
Kind($x, bool) ::: Is x kind?
Facts:
Round(Bear, True) ::: The bear is round.
Likes(Bear, Cow, True) ::: The bear likes the cow.
Visits(Bear, Lion, True) ::: The bear visits the lion.
Big(Cow, True) ::: The cow is big.
Rough(Cow, True) ::: The cow is rough.
Needs(Lion, Bear, True) ::: The lion needs the bear.
Likes(Mouse, Lion, True) ::: The mouse likes the lion.
Rules:
Visits($x, Lion, True) >>> Round($x, True) ::: If someone visits the lion then they are round.
Needs($x, Bear, True) >>> Needs($x, Lion, True) ::: If someone needs the bear then they need the lion.
Needs($x, Lion, True) >>> Visits($x, Lion, True) ::: If someone needs the lion then they visit the lion.
Visits($x, Lion, True) && Round(Lion, True) >>> Likes($x, Lion, True) ::: If someone visits the lion and the lion is round then they like the lion.
Rough($x, True) >>> Needs($x, Lion, True) ::: If someone is rough then they need the lion.
Likes($x, Lion, True) && Needs($x, Lion, True) >>> Visits(Lion, Mouse, True) ::: If someone likes the lion and they need the lion then the lion visits the mouse.
Rough($x, True) >>> Visits($x, Lion, True) ::: If someone is rough then they visit the lion.
Likes($x, Lion, True) && Big(Lion, True) >>> Kind($x, True) ::: If someone likes the lion and the lion is big then they are kind.
Query:
Visits(Lion, Lion, True) ::: The lion visits the lion.
I encountered the same issue, when solving ProntoQA using pyke. One typical example is the following. The problem is ProntoQA_10:
{
"id": "ProntoQA_10",
"context": "Every impus is earthy. Each impus is a jompus. Jompuses are small. Jompuses are rompuses. Rompuses are not amenable. Rompuses are wumpuses. Wumpuses are wooden. Wumpuses are zumpuses. Every zumpus is temperate. Every zumpus is a dumpus. Dumpuses are dull. Dumpuses are vumpuses. Every vumpus is not shy. Every yumpus is sweet. Vumpuses are numpuses. Numpuses are not sweet. Numpuses are tumpuses. Fae is a wumpus.",
"question": "Is the following statement true or false? Fae is sweet.",
"options": [
"A) True",
"B) False"
],
"answer": "B",
"explanation": [
"Fae is a wumpus.",
"Wumpuses are zumpuses.",
"Fae is a zumpus.",
"Every zumpus is a dumpus.",
"Fae is a dumpus.",
"Dumpuses are vumpuses.",
"Fae is a vumpus.",
"Vumpuses are numpuses.",
"Fae is a numpus.",
"Numpuses are not sweet.",
"Fae is not sweet."
]
}
And the formation from natural language to program is also correct:
fact1
foreach
facts.Impus($x, True)
assert
facts.Earthy($x, True)
fact2
foreach
facts.Impus($x, True)
assert
facts.Jompus($x, True)
fact3
foreach
facts.Jompus($x, True)
assert
facts.Small($x, True)
fact4
foreach
facts.Jompus($x, True)
assert
facts.Rompus($x, True)
fact5
foreach
facts.Rompus($x, True)
assert
facts.Amenable($x, False)
fact6
foreach
facts.Rompus($x, True)
assert
facts.Wumpus($x, True)
fact7
foreach
facts.Wumpus($x, True)
assert
facts.Wooden($x, True)
fact8
foreach
facts.Wumpus($x, True)
assert
facts.Zumpus($x, True)
fact9
foreach
facts.Zumpus($x, True)
assert
facts.Temperate($x, True)
fact10
foreach
facts.Zumpus($x, True)
assert
facts.Dumpus($x, True)
fact11
foreach
facts.Dumpus($x, True)
assert
facts.Dull($x, True)
fact12
foreach
facts.Dumpus($x, True)
assert
facts.Vumpus($x, True)
fact13
foreach
facts.Vumpus($x, True)
assert
facts.Shy($x, False)
fact14
foreach
facts.Yumpus($x, True)
assert
facts.Sweet($x, True)
fact15
foreach
facts.Vumpus($x, True)
assert
facts.Numpus($x, True)
fact16
foreach
facts.Numpus($x, True)
assert
facts.Sweet($x, False)
fact17
foreach
facts.Numpus($x, True)
assert
facts.Tumpus($x, True)
But after giving these to pyke, the output prediction is A rather than the correct answer B. Is there some problems with pyke? Is pyke giving the correct answer?
I'm conducting trials with ProofWriter and stumbled upon an instance where, despite the formalization appearing accurate, it's yielding inaccurate results.
The Problem is is
ProofWriter_RelNoneg-OWA-D5-775_Q9
, it's taken from the development set.The textual context is:
The question is:
Based on the above information, is the following statement true, false, or unknown? The tiger needs the rabbit.
The formalized Prolog code is:
The predicted result and ground truth result are:
This raises the question: provided that all the facts, predicates, and rules are correctly formalized, can Pyke always generate the right answer for a query?