Pattern matcher branch exploration problem

jswiergo commented 9 years ago

There is a bug in pattern matcher for queries having common atoms shared by different sub-patterns (sub-tree branches). See example:

(define (SharedAtom)
        (AndLink
                (ConceptNode "A")
        )
)

(define (Tree)
        (ListLink
                (SharedAtom)
                (ListLink
                        (SharedAtom)
                        (ConceptNode "B")
                )
        )
)
(Tree)

(define (query1)
        (AndLink
                (VariableNode "$a")
        )
)

(define (query2)
        (VariableNode "$a")
)

(define (tree-query input-query)
        (BindLink
                (VariableList
                        (VariableNode "$a")
                        (VariableNode "$b")
                )
                (AndLink
                        (ListLink
                                (input-query)
                                (ListLink
                                        (input-query)
                                        (VariableNode "$b")
                                )
                        )
                )
                (ListLink
                        (VariableNode "$a")
                        (VariableNode "$b")
                )
        )
)

(cog-bind (tree-query query1))
(cog-bind (tree-query query2))

The answer is:

$2 = (SetLink
   (ListLink
      (ConceptNode "A")
      (ConceptNode "B")
   )
   (ListLink
      (ConceptNode "A")
      (ConceptNode "B")
   )
)

$3 = (SetLink
   (ListLink
      (AndLink
         (ConceptNode "A")
      )
      (ConceptNode "B")
   )
)

The answer for query1 is duplicated. The right size of satisfying set is 1 for both queries. This is simple example but this may bring weird problems for more complicated settings.

I have checked that it happens when there exists an atom shared by sub-patterns and moreover the main pattern is explored recursively bottom-up. Then all branches are explored. The difference between this two queries is because the pattern matcher uses "choose thinnest term strategy" and finds as starters different terms. For query1 the starter is a bottom atom:

Start term is: (AndLink (stv 1,000000 0,000000)
  (VariableNode "$a") ; [7]
) ; [10]

For query2 the starter is a top atom:

Start term is: (ListLink (stv 1,000000 0,000000)
  (VariableNode "$a") ; [7]
  (ListLink (stv 1,000000 0,000000)
    (VariableNode "$a") ; [7]
    (VariableNode "$b") ; [8]
  ) ; [14]
) ; [19]

It seems that the shared pattern should be explored only in one location and this is a fix. I am not sure yet.

jswiergo commented 9 years ago

Here is more complex example showing how branches are missed as well as duplicated.

Test case: https://github.com/jswiergo/atomspace/blob/shared_atoms_example/tests/query/shared-atoms.scm Actual output: https://github.com/jswiergo/atomspace/blob/shared_atoms_example/tests/query/shared-atoms-actual.scm Expected output: https://github.com/jswiergo/atomspace/blob/shared_atoms_example/tests/query/shared-atoms-expected.scm

The missing branch effect is caused by changes of the same permutation state (_perm_state map) for the same shared unordered links in different search branches.

linas commented 9 years ago

Wow. OK, interesting bugs. Your explanation makes sense. Not sure what the fix should be; its not immediately obvious how to detect repeated sub-patterns.

bgoertzel commented 9 years ago

Wow.. this is some heroic bug-finding, Jacek !!

On Wed, Jul 15, 2015 at 4:22 PM, Linas Vepštas notifications@github.com wrote:

Wow. OK, interesting bugs. Your explanation makes sense. Not sure what the fix should be; its not immediately obvious how to detect repeated sub-patterns.

— Reply to this email directly or view it on GitHub https://github.com/opencog/atomspace/issues/148#issuecomment-121735231.

Ben Goertzel, PhD http://goertzel.org

"The reasonable man adapts himself to the world: the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man." -- George Bernard Shaw

linas commented 9 years ago

as mentioned in pull request #86 the solution for the unordered link is this:

in the case of shared unordered links there is as well a problem 
with _perm_state structure that keeps states of permutations being explored.
Now the structure is a map:
("sub-pattern handle","grounding handle") => mutation.
But it should consider sharing as well:
("sub-pattern handle","position in main pattern","grounding handle") => mutation

jswiergo commented 9 years ago

I have tested the solution using ("sub-pattern handle", "position in main pattern", "grounding handle") => mutation maps. Unfortunately it turns out that this change alone is not sufficient, because there are other problems. For example when PM founds one valid grounding for some permutations (and calls accept_clause method) then after that it is not able to continue to explore next permutations. It just switches to explore next branch in the grounding graph and skips potentially good permutations.

I have found difficult to understand the truth tables for flags "take_step", "have_more" for corner cases (e.g. in bottom-up search, when accepting clauses). It seems current design of this flags is quite error-prone. I am trying to simplify and refactor this design and create some pictures for illustration. Will let you know.

jswiergo commented 9 years ago

Here is detailed explanation: https://github.com/jswiergo/atomspace/blob/shared_atoms_example/opencog/query/README-Unordered-Links.md

linas commented 9 years ago

Wow. Yes, what you say makes sense. the take_step/have_more was the best I could come up with, at the time, it was hard to arrive even there. Yes, please do refactor!

bgoertzel commented 9 years ago

Jacek, the solution you mention seems to make sense from what I understand. Thanks, this is great stuff -- not many people can dig this deep!!

As for performance, I am not sure the hypothetical performance issues you mention will actually be problematic given real Atomspaces. At a guess if the unordered links are not too long the performance may be fine (though I guess that's an oversimplification)...

The use of depth first search seems a good approach, I can't think of a better one at the moment...

ben

On Mon, Aug 10, 2015 at 9:47 PM, Linas Vepštas notifications@github.com wrote:

Wow. Yes, what you say makes sense. the take_step/have_more was the best I could come up with, at the time, it was hard to arrive even there. Yes, please do refactor!

— Reply to this email directly or view it on GitHub https://github.com/opencog/atomspace/issues/148#issuecomment-129562633.

Ben Goertzel, PhD http://goertzel.org

"The reasonable man adapts himself to the world: the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man." -- George Bernard Shaw

linas commented 9 years ago

OK, one thing does not make sense: for issue 2, you state:

When the tree_compare method finds the first good mutation this matching is reported to the caller for acceptance. Till that moment everything goes right. But then it stops processing current proposed grounding term, switches to another proposed term and continues to search next mutations. This is the reason why it may miss some good mutations.

This is not right. Yes, when a match is found, matching moves on to the next term. This continues until all terms are exhausted. At that point, the matching is supposed to backtrack to the unordered link, and try the remaining untried permutations. I am pretty sure that this works, and I am pretty sure that there are unit tests for this. This is why there are stacks: the current permutation state is saved, so that, on backtracking, it can be restored, and the remaining permutations tried.

linas commented 9 years ago

I also read problem 3 and 4 more carefully, just now. So, problem 4 is what started this issue, there is a design-fault there. For problem 3, I remember thinking about how permuations need to be "frozen" at certain points, and perhaps there is a bug there.

I also re-read your description of problem 2 again; perhaps this is a corner case, where the very first, or the very last permutation is sometimes skipped? Still I really thought that there were unit tests that tried these cases....

jswiergo commented 9 years ago

Yes, unit tests cover some cases. For example the first answer below is correct. For the second query below you may observe the result I have described. It breaks after the first match is found and does not continue to search remaining permutations.

In 1:

(ListLink
  (UnorderedLink
    (ConceptNode "A")
    (ConceptNode "B")
  )
)

(cog-bind
  (BindLink
    (VariableList
      (VariableNode "$a")
      (VariableNode "$b")
    )
    (AndLink
      (ListLink
        (UnorderedLink
          (VariableNode "$a")
          (VariableNode "$b")
        )
      )
    )
    (ListLink
      (VariableNode "$a")
      (VariableNode "$b")
    )
  )
)

Out 1:

(SetLink
  (ListLink
    (ConceptNode "A")
    (ConceptNode "B")
  )
  (ListLink
    (ConceptNode "B")
    (ConceptNode "A")
  )
)

In 2:

(ListLink
  (UnorderedLink
    (ConceptNode "A")
    (ConceptNode "B")
  )
  (UnorderedLink
    (ConceptNode "C")
    (ConceptNode "D")
  )
)

(cog-bind
  (BindLink
    (VariableList
      (VariableNode "$a")
      (VariableNode "$b")
      (VariableNode "$c")
      (VariableNode "$d")
    )
    (AndLink
      (ListLink
        (UnorderedLink
          (VariableNode "$a")
          (VariableNode "$b")
        )
        (UnorderedLink
          (VariableNode "$c")
          (VariableNode "$d")
        )
      )
    )
    (ListLink
      (VariableNode "$a")
      (VariableNode "$b")
      (VariableNode "$c")
      (VariableNode "$d")
    )
  )
)

Out 2:

(SetLink
   (ListLink
      (ConceptNode "A")
      (ConceptNode "B")
      (ConceptNode "C")
      (ConceptNode "D")
   )
)

linas commented 9 years ago

OK. Well, the code was meant to backtrack and explore all permutations, so OK, there is a bug.

This sentence from "problem 2" made me nervous: 'Instead it should search all remaining mutations and compare them to the original grounding term, then switch to another proposed term after exhaustion and search the query tree from the beginning starting from the first mutation.'

I don't see why this is needed. Backtracking should allow the search to resume where it last left off. I see no need to "search till exhaustion".

jswiergo commented 9 years ago

Right, it is not needed, it just describes one of many ways how to reach superior goal: find all solutions.

Here is en example for clarification. (this is extended example 2 from above).

(ListLink
  (UnorderedLink
    (ConceptNode "A")
    (ConceptNode "B")
  )
  (UnorderedLink
    (ConceptNode "C")
    (ConceptNode "D")
  )
)

(ListLink
  (UnorderedLink
    (ConceptNode "A")
    (ConceptNode "B")
  )
  (UnorderedLink
    (ConceptNode "E")
    (ConceptNode "F")
  )
)

The query from example 2 remains unchanged. Assume that the starter term is:

(UnorderedLink
  (VariableNode "$a")
  (VariableNode "$b")
)

As far as I understand your intention is following search order:

(A,B,C,D)
(A,B,D,C)
(A,B,E,F)
(A,B,F,E)
(B,A,C,D)
(B,A,D,C)
(B,A,E,F)
(B,A,F,E)

Indeed it seems to be more natural and probably has a little better efficiency.

I described different approach:

(A,B,C,D)
(A,B,D,C)
(B,A,C,D)
(B,A,D,C)
--
(A,B,E,F)
(A,B,F,E)
(B,A,E,F)
(B,A,F,E)

I am not attached to this, then I just wanted to point out that some permutations are missed and how we can avoid it.

linas commented 9 years ago

There is no intended search order; search order is a by-product of the "simplest algorithm". In principle, I guess we could allow a callback that allows the user to specify the next permutation to try; in practice, this seems like wild over-engineering at this point, as no one would use it.

The current pattern matcher is built on a general model of saving state on a stack, and back-tracking to explore other possibilities; I want to stick to that model of stacks and back-tracking, mostly because if we place some other technique into the middle of this, it will just heighten confusion over the long-run.

linas commented 4 years ago

As of this time, I get

scheme@(guile-user)> (cog-bind (tree-query query1))
Obsolete! Do not use cog-bind, use cog-execute! instead.
$2 = (SetLink
   (ListLink
      (ConceptNode "A")
      (ConceptNode "B")
   )
)

scheme@(guile-user)> (cog-bind (tree-query query2))
Obsolete! Do not use cog-bind, use cog-execute! instead.
$3 = (SetLink
   (ListLink
      (AndLink
         (ConceptNode "A")
      )
      (ConceptNode "B")
   )
)

so the simplest example no longer applies. Looking at the more complex cses.

linas commented 4 years ago

After looking at https://github.com/jswiergo/atomspace/blob/shared_atoms_example/tests/query/shared-atoms.scm it appears that the current code misses half teh permutations of query1 but query2 looks correct. I'll add this to a unit test shortly. There are ongoing issues with unordered links. I had hoped that #2359 and #2357 had fixed all this, but apparently not :-(

linas commented 4 years ago

Closing. Issue #2388 includes the tests here as part of the unit test.

opencog / atomspace

Pattern matcher branch exploration problem #148