opencog / atomspace

The OpenCog (hyper-)graph database and graph rewriting system
https://wiki.opencog.org/w/AtomSpace
Other
819 stars 232 forks source link

Unification with variables on both sides #874

Closed ngeiswei closed 7 years ago

ngeiswei commented 8 years ago

Overview

I need a function that would perform unification between atoms that both contain variables.

Context

The Backward Chainer requires that. The current BC has a unification function BackwardChainer::unify that is supposed to do that but does not.

Specification

I'll describe the specification by examples. I'm giving a simplified type signature of the function, specifically I'm letting aside variable declarations, but they should of course be taken into account in the final implementation.

Type signature

The type signature of the function is

HandleMapSeq unify(Handle, Handle)

Examples

1.

unify((Variable "$X"), (Concept "A"))
->
[{(Variable "$X"):(Concept "A")}]

2.

unify((Concept "A"), (Concept "$X"))
->
[{(Variable "$X"):(Concept "A")}]

3.

unify((Inheritance (Concept "A") (Concept "B")), (Variable "$X"))
->
[{(Variable "$X"):(Inheritance (Concept "A") (Concept "B"))}]

4.

unify((Inheritance (Concept "A") (Variable "$Y")),
      (Inheritance (Variable "$X") (Concept "B"))
->
[{(Variable "$X"):(Concept "A"),
  (Variable "$Y"):(Concept "B")}]

etc. I didn't give an example with multiple mappings but they would occur if for instance one uses an unordered link, thus containing multiple possible permutations. I think it's enough to give the idea.

How to code it

So assuming such a function doesn't exist (I think not), the question is, how should it be implemented? My spontaneous answer is to run 2 instances of pattern matching, one where lhs is considered as pattern and rhs is put in a temporary atomspace as grounding, and vice versa (rhs is considered as pattern and lhs is put in a temporary atomspace as grounding), run these 2 instances of pattern matching and build the union of their mappings going both ways. The node_match and variable_match callbacks will have to be overloaded so that unification as described in example 4. can be possible (namely, a node needs to accept a variable of the same or inherited type as a match, etc).

I'm feel my spontaneous suggestion has shortcomings and will not encompass all cases, but might be good enough for a while. Any better suggestion is welcome.

linas commented 8 years ago

I believe that there is a far more straight-forward implementation (based on your description): just compare the left and right sides, side-by-side, recursively. If one is a variable, and one is not, that's your unification.

The pattern matcher is kind-of overkill for this, because it tries to unify unordered sets of clauses: i.e.

unify(UnorderedHandleSet, UnorderedHandleSet)

so it has to deal with re-ordering, as well as making sure that there are no conflicting variable assignments. As well as enforcing variable type checking. As well as treating QuoteLink as a special thing -- and treating ChoiceLink as special -- and AbsentLink -- and PresentLink -- and evaluating boolean combinations that involve Absent/Present -- and being careful with unordered links. There's also some amount of monkey business with Lambda and evaluating things, and etc. You did not indicate which of these things, if any, you may want.

I can write this code for you -- the basic case is very easy. (famous last words: "just an afternoon")

There is also something related, which I've started working on, and should have done many years ago, before the pattern matcher. Its the C++ API that MapLink uses: defacto, MapLink already does a half-unification, already -- I call it "half-unification" because it allows variables only on the left side, and never the right. I can easily update that to allow variables on either side.

FWIW: MapLink is kind-of-like the pattern-matcher, but instead of searching the entire atomspace, it limits it's search only to the list of atoms that you supplied.

ngeiswei commented 8 years ago

Thanks Linas! I already started fixing it using the pattern matcher. I totally see what you mean by this recursive function, I probably would have done that if I had started from scratch but I'm taking over someone else code so. Also the pattern matcher already handles annoying stuff like exploring the various permutations for unordered links.

Ultimately, I seem to agree, such a unification should be the general case of the MapLink, maybe we should unify all this at some point. But I'm also trying to move forward rapidly so that we finally have a BC.

linas commented 8 years ago

OK. I'm slowly trying to explore how to do pattern-matcher-like things, without requiring/needing the atomspace, and maybe improving performance(?) -- this partly explains why there is a MapLink.

While thinking about what you wrote, I concluded that we also need to invent a ValuationLink -- which is just a plain-old ListLink, except that the first atom must always be a variable, and the second atom can be anything. I've repeatedly had to struggle with the idea of returning (in atomese) the pair (variable, some value for it) and struggled to find some good way of indicating that's what it is. Just using a ListLink is OK, but for the human staring at it, you have no clue that some particular ListLink is "special", i.e. contains these variable+value pairs. It also makes type-checking hard.

Anyway, if the ValuationLink is invented, this would then allow a UnifyLink to also be invented, that does what you describe, and then returns a SetLink filled with ValuationLinks.

None of this helps you directly with the backward chainer; its more about problems I've been getting tripped up by.

ngeiswei commented 8 years ago

A ValuationLink may make sense indeed.

ngeiswei commented 8 years ago

The pattern matcher way is indeed too convoluted and limited. I did manage to make all examples work, but had to hack the pattern matcher engine so that it doesn't choose (Concept "A") or (Concept "B") as starting points in example 4, which obviously leads nowhere, and rather chooses Inheritance. The problem then is that other unit tests like SequenceUTest fail.

Although I'm close to the goal I think I'm gonna give up and implement a recursive function from scratch, it will be less convoluted, more elegant and efficient. For sure such a function as general as that shouldn't be located in the Backward Chainer code. I suppose I should put that somewhere under opencog/atomutils, or do you have a better suggestion, @linas?

ngeiswei commented 8 years ago

I meant "the pattern matcher way is too convoluted", not the pattern matcher itself (although it might true too ;-p).

linas commented 8 years ago

As mentioned earlier, there's already a fair amount of code that already does a "half-unification" -- i.e. on only one side. I haven't yet needed the symmetric, two-sided version that you describe.

The MapLink has some of that code, there are parts of it scattered elsewhere. I don't quite recall.

Look at bool MapLink::extract() -- its complicated, mostly to prove support for GlobNode, which you probably don't need/use.

ngeiswei commented 8 years ago

The problem is more difficult that I initially thought. The recursive function I'm writing is covering almost all cases but not all, to cover them all we need a satisfiability solver. Take this example

unify((Inheritance (Variable "$X") (Variable "$X")),
      (Inheritance (Variable "$Y") (Variable "$Z")))

my current algorithm will chock on it because X maps to 2 different atoms, Y and Z, while in fact such an equation is satisfiable. The trick of allowing duplicate maps of variables with compatible type only goes that far, because ultimately you may end up with a network of variables, and checking that there consistency isn't that easy.

So now, I'm left with 2 options

  1. Complete my recursive function, it will cover most cases, maybe even all BC use cases.
  2. Rewrite from scratch the right way performing satisfiability (which requires more subtleties like partitioning the expressions into DAGs of strongly connected components of variables, which I think is similar in spirit to the way the combo type checker handle mutually recursive functions, see https://github.com/opencog/moses/blob/master/moses/comboreduct/combo/procedure_repository.cc#L251)

I guess I'm gonna go with 1. for now given how close I am and how unsure we actually need 2. but if we're to use it in more general contexts, we would really need to implement 2.

bgoertzel commented 8 years ago

Nil. You'd better do it the right way or we will run into trouble before too long ...

ngeiswei commented 8 years ago

OK, I think there's a lot of what I've written so far that can be reused actually. In any instance, I'm happy to do it right.

linas commented 8 years ago

I'm with Nil ... trying to solve the satisfiability-modulo-equality-theory problem is theoretically interesting, but seems unlikely to be a practical problem for a long time to come. And, as Nil points out, there are some fairly simple tricks that can handle all of the common cases.

ngeiswei commented 8 years ago

I haven't thought very deeply about how to do it right but I'd say that it would take me from an extra half to whole week, on the other hand the tricky/limited way covering most cases should be ready tomorrow.

For the BC, the tricky/limited way would probably suffice for most user and BC generated queries. However I can see that if the user runs a query with some variable occurring at multiple places, like

Inheritance
   X
   X

then it will choke. Of course I could hack the tricky/simpler algo so that it doesn't but then it will eventually choke on something else.

Yeah eventually we're gonna have to do it right, so the question is should I do it now or later. I'm happy either way.

ngeiswei commented 8 years ago

I've been thinking a bit more and I think it's exactly like the mutually recursive function type checking algo of comboreduct. So I don't foresee any major hurdle, I stand by my ETA.

bgoertzel commented 8 years ago

An extra week to do this right is well worth it. You're in the fortunate position of being funded to just work on OpenCog and make it a solid AGI platform, so let's do things right when it's not insanely difficult or slow to do so...

linas commented 8 years ago

I trust Nil to "do the right thing" -- the problem of "taking an extra week" is that sometimes that results in over-engineering: at the end of the week, one suddenly realizes that there was some alternate approach that was better, simpler, easier, or maybe the current code isn't needed because the core concept was flawed. Sometimes the best solution is the minimal solution. (as Nil already discovered: unifier version 1.0 was a dead-end).

ngeiswei commented 8 years ago

After thinking a fair bit I think that unifier version 2.0 might not be that much more complicated. It requires to

  1. Generate all equality partitions
  2. Check that each partition is satisfiable

For 1., it should merely be a recursive function similar to version 1.0. For 2., for each partition's block, compute the type that intersects all its elements, repeat till a fixed point is reached. To do that efficiently you need to build a DAG indeed, but at first we can just compute type intersections is random order, and only worry about that once efficiency is a concern.

So it seems the only tricky part is calculating type intersection. What I mean by that is: say you have for instance a block with

X
ListLink(Y)
ListLink(Z)

meaning that X is equal to ListLink Y which is equal to ListLink Z, each having the following types at that point (i.e. not having reached the fixed point yet)

X:Atom
ListLink(Y):ListLink(Atom)
ListLink(Z):ListLink(Atom)

then their type intersection will be

ListLink(Atom)

which is supposed to represent the set of all potential groundings that may satisfy that block.

I'm gonna have a look at Linas' type checking code in case such type intersection function already exists, otherwise coding one shouldn't be very hard.

ngeiswei commented 8 years ago

As I was looking at some random unification algos online I realized that my solution doesn't implement unification per se but rather satisfiability. Anyway, patching unification on top of that shouldn't be hard and what I need for now is satisfiability (this is for rule selection), so I'm gonna go ahead and implement that...

ngeiswei commented 8 years ago

I'm done with that issue for now. I do not close it because I will come back to it in the near future.

ngeiswei commented 7 years ago

The unifier is nearly complete by now, I'm closing.