opencog / atomspace

The OpenCog (hyper-)graph database and graph rewriting system
https://wiki.opencog.org/w/AtomSpace
Other
826 stars 234 forks source link

Recursively retrieve links of a node. #2472

Open Habush opened 4 years ago

Habush commented 4 years ago

Imagine we have the following relationship in a sample atomspace:

(Inheritance
    (Concept "A")
    (Concept "B")
)

(Inheritance
    (Concept "B")
    (Concept "C")
)

(Inheritance
    (Concept "B")
    (Concept "F")
)

(Inheritance
    (Concept "C")
    (Concept "D")
)

Assume the above Inheritance link represents child-parent relationship . Is there a way ,using the pattern matcher, to recursively find the parents (B), grandparents (C, F) and great-grandparent (D) ...etc of (Concept A)? Also can we limit the depth of the recursion? That is, if we specify 2, it will return the immediate parents and the grandparents of the node?

linas commented 4 years ago

No, not really. This is a very simple search, and you don't need the pattern matcher to accomplish this. You can perform this search in maybe half-a-dozen lines of scheme code. (or less)

The pattern matcher is designed for complex graph searches. What you describe above is just a very simple tree, and tree-walking is a very basic example of recursion.

Take a look at section 2.2.2 "Hierarchical structures" of SICP https://web.mit.edu/alexmv/6.037/sicp.pdf (page 147 of that PDF) -- if you have not read SICP, you should. It's a lot like taking powerful hallucinogenic drugs - everything you believe about (programming) reality will be turned upside-down and inside-out.

Habush commented 4 years ago

You can perform this search in maybe half-a-dozen lines of scheme code

I have already written the scheme code and was wondering if the PM can do it and if it that version has a better performance.

The pattern matcher is designed for complex graph searches. What you describe above is just a very simple tree, and tree-walking is a very basic example of recursion.

OK. But if it is so simple why not just include it. Also this can get more complicated if we want to filter the parents based on some property, (in the above code it is filtering them by their GO_namespace)

if you have not read SICP, you should. It's a lot like taking powerful hallucinogenic drugs - everything you believe about (programming) reality will be turned upside-down and inside-out.

Thanks, I have been putting off reading this book for so long. I guess I should read it now.

linas commented 4 years ago

I have already written the scheme code

Hmm. I have two issues with that code: its completely undocumented: what are the input values? What does it do, what kind of output does it generate? Then quickly scanning it, it seems to be inefficient, using constructs like flatten which should not be needed, and append, when cons should have been enough. I recall trying to read that code, and was completely lost, unable to understand what it did. It doesn't help that it calls things like find-parent which is quite complex itself. Again, for simple tree-walking, this seems to be excessively complicated and hard-to-understand.

and was wondering if the PM can do it

Again, no. The PM is not meant for recursively-defined structures or tree-walking. It is perhaps misnamed, it is actually a "subgraph isomorphism solver"; see https://en.wikipedia.org/wiki/Subgraph_isomorphism_problem

and if it that version has a better performance.

Why do you believe that it would have better performance?

But if it is so simple why not just include it.

Because that causes code-bloat, cruft, spaghetti-code and debuggability nightmares. Writing a tree-walk algorithm should not take more than 5 minutes (OK, so the first time you do it in your life, it will take an hour or two or more; but once you understand the general idea, its easy.)

When faced with "easy" algorithms, one has a trade-off: is it simpler to just write the code yourself (in 5 minutes) or spend 5 minutes to search the documentation to see if such a function exists, then five minutes to read the documentation to decide if it does what you need it to do, and then spend 5 more minutes testing it to make sure it actually does what you think it does, (and then discovering it does something slightly different) and then finally 5 more minutes to write a shim that adapts it to your code? Yuck. This process results in obscure, hard-to-read, hard-to understand stove-pipe code.

There are, of course, exceptions: for example, srfi-1 is a large collection of utilities for performing basic manipulations on lists. https://srfi.schemers.org/srfi-1/srfi-1.html Even though everything in there is "simple", using it is easier than writing your own, for three reasons: (1) it is well-designed, and clear, and documented. (2) everyone else uses it, which makes it easy for other people to understand your code, and for you to understand their code. (3) after using srfi-1 for a few days or weeks, it becomes easy to memorize and remember what it does - it is easier to use something from srfi-1, than it is to re-create it. (4) its fully debugged.

The key here is reason (1) well-designed. Once you get to know srfi-1 well, you will also spot a few areas where it is poorly designed; but these are "outlying regions", stuff you would not/should not normally do, except maybe in some emergency situation. Some 95% of it is in excellent shape, ad the remaining 5% is "good enough".

There are no trees in srfi-1. There are 179 srfi's as of today, and as far as I know, none of them deal with trees. I don't know why not. Perhaps no one has invented a good API for trees? i.e. nothing better than what you could create "in 5 minutes"?

linas commented 4 years ago

As an exercise, try to write the simplest tree-walking code possible for the example you give above, and post it here, and we can try to reconstruct it or fix it or simplify it further.

vsbogd commented 4 years ago

I believe it can be done using URE: https://github.com/opencog/ure

ngeiswei commented 4 years ago

I believe it can be done using URE: https://github.com/opencog/ure

Exactly :-). I'm really thinking of adding this BackwardChainerGetLink now, to make it almost as accessible as a pattern matcher call...

linas commented 4 years ago

OK, but I urge extreme caution. Lets not replace something that can run in some dozens of microseconds with something that requires dozens of milliseconds.

I'm not joking when I say "this tree can be walked in half-a-dozen lines of code".

In particular, "half-a-dozen lines" means that you can sight-read where the CPU cycles are going to be spent. In this case, I see:

The forth bullet above is important: if you represented the vertexes and edges in pure scheme, just using cons as shown in SICP, then your scheme program will run at least 30x maybe 100x faster than the Atom-using variant. So merely using the AtomSpace for your data-processing is giving you a HUGE performance slow-down. Why would you pay this large cost, unless you are winning back something even more important?

Using the URE to perform edge traversal layers maybe another 10x performance hit (or more?) on top of everything else ... so now we are talking about code that runs at least 100x if not 1000x slower than what any SICP homework exercise requires. Is it worth it?

(In my own defense, I guess one could also say that a tree-traversal in tinker-pop or one of the misc java-based graph DB's is also going to be 1000x slower than what raw scheme can do. But still -- lets be careful not to splurge.)

linas commented 4 years ago

p.s. BackwardChainerGetLink : open an issue somewhere to discuss and track. Sounds like a good idea, I suppose.

linas commented 4 years ago

Regarding chaining, in general: I spent the last month staring at the MOZI gene annotation code. It's .. interesting. It's consists of long hand-built chains, by which I mean:

(define (some-func args)
   BindLink
         ...query-body using the args...
         ExecutionOutput
               GroundedSchema "scm: some-other-func"
)

and some of these chains are 4-5-6 nestings deep. This is .. well, it was an unexpected design style. Several remarks and a proposal:

By fiddling with the above, I was able to get that code to run from 10x to 100x faster.

The BindLink... GroundedSchema layering .. that does indeed suggest that maybe URE and/or ChainerLink or something might be a better way of structuring the data flow. Maybe. I don't much like or care for the BindLink... GroundedSchema idiom -- it seems fragile, convoluted. It's a hard-coded pipeline.

bgoertzel commented 4 years ago

Thanks a lot for the help w/ Mozi Atomese code optimization, Linas ;)

Nil, I agree w/ your sense that using the URE to provide a general, easy-to-use tree-walking functionality for Atomspace is going to be valuable. Of course the right specialized piece of code will almost always be at least a little more efficient, but if we want to get wide adoption for OpenCog we need (among many other things) to make it less work for developers.... If optimization of a particular tree-walk is badly needed then a developer can always write their own code in that case...

On Thu, Jan 16, 2020 at 3:22 AM Linas Vepštas notifications@github.com wrote:

Regarding chaining, in general: I spent the last month staring at the MOZI gene annotation code. It's .. interesting. It's consists of long hand-built chains, by which I mean:

(define (some-func args) BindLink ...query-body using the args... ExecutionOutput GroundedSchema "scm: some-other-func" )

and some of these chains are 4-5-6 nestings deep. This is .. well, it was an unexpected design style. Several remarks and a proposal:

  • Some of these are simple enough that a srfi-1 variant would be faster/easier/more efficient, e.g. by a factor of 10x in a few cases where I tried it.
  • None of them made use of memoization, and instead ran the same search over and over. That is, called the same function, with the same arguments - each call is a pattern matcher search, which is slow. By memoizing the previous search results, and just returning those, another factor of 10x per is possible. Although there is a handful of memoiztion utilities (e.g. make-afunc-cache) they are not really "core" parts of the atomspace. We do not have any API that can say "hey, the Atomspace hasn't changed (much) since the last time you ran this search, so its safe for you to re-use the results from your last search".

By fiddling with the above, I was able to get that code to run from 10x to 100x faster.

The BindLink... GroundedSchema layering .. that does indeed suggest that maybe URE and/or ChainerLink or something might be a better way of structuring the data flow. Maybe. I don't much like or care for the BindLink... GroundedSchema idiom -- it seems fragile, convoluted. It's a hard-coded pipeline.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/opencog/atomspace/issues/2472?email_source=notifications&email_token=ABNCKXE7EBFLLBUSYNJLOYTQ55O4DA5CNFSM4KGVNBX2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJBPUXY#issuecomment-574814815, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNCKXDCFADLS2DVZTQQ2DTQ55O4DANCNFSM4KGVNBXQ .

-- Ben Goertzel, PhD http://goertzel.org

“The only people for me are the mad ones, the ones who are mad to live, mad to talk, mad to be saved, desirous of everything at the same time, the ones who never yawn or say a commonplace thing, but burn, burn, burn like fabulous yellow roman candles exploding like spiders across the stars.” -- Jack Kerouac

linas commented 4 years ago

OK, so let's return to @Habush original question, and cross out the words "pattern matcher", like so: pattern matcher and instead phrase it slightly differently: "How can one describe recursive tree traversal in a declarative fashion, (i.e. as "atomese") instead of a procedural (c++/python) or functional (scheme, haskell) fashion?"

Before trying to answer the question, let me expand on it a bit more (repeating myself, slightly)

Well, maybe nothing is wrong with that, but I don't really like it. The reason I don't like it is because it starts to turn Atomese into "just another programming language", and the world doesn't need yet another one of those. Worse, programming in atomese is like hand-writing AST's (see the first figure in the wikipedia article) instead of programming in a human-readable language (the short program underneath that first figure). The short program is human-friendly, the AST is not. What AST's are, is that they are term-rewriting friendly. Suitable for knowledge representation (KR). Suitable for graph algorithms. I'd like to have atomese stay true to it's roots as a KR language, or a https://en.wikipedia.org/wiki/Abstract_semantic_graph language. I don't want atomese to morph into a badly-designed functional programming language.

So, again back to @Habush original question: "How can one express a recursive pattern match, in a declarative way, such that the execution of that recursive pattern match results in a re-write of the contents of the AtomSpace?"

I do not yet have a good answer for that. Where can we look for inspiration? Well, all macro languages are a form of term-rewriting systems. This includes the C/C++-preprocessor macro language, the m4 macro language, and the scheme hygenic macro language. They all share a common trait: they run once, and only once, on a given input, producing an output, and then they stop. Also, you have little or no control over how the macros are applied: the macros are applied in whatever order they can be, until application terminates.

I'm talking about macros because you can think of a single macro as a single "rewrite rule". Well, we already have those: we call them "BindLinks". and unlike macros, we can run BindLinks over and over, on demand, and we can even have the URE sequence them for us. So, we already have macro-style abilities, and more fine-grained, more controllable.

Are there recursive macros? Sort-of-ish. In c/c++, see for example: https://stackoverflow.com/questions/12447557/can-we-have-recursive-macros How about m4? ftp://ftp.gnu.org/old-gnu/Manuals/m4/html_chapter/m4_5.html Ooooof. How about recursive macros in scheme? First: a quick overview of macros in scheme: https://beautifulracket.com/explainer/macros.html and then the first pitfall: https://stackoverflow.com/questions/49632262/recursive-macro-in-scheme-causes-unexpected-loop and https://www.reddit.com/r/scheme/comments/9zl4fe/recursive_macros/ ooof. Perhaps @amirouche can say something enlightening?

The reddit discussion is interesting because it mentions PEG https://en.wikipedia.org/wiki/Parsing_expression_grammar Which now suggests another way to ask @Habush original question: is there a way to specify a PEG, but for graphs instead of for strings?

I think this is a sensible question, because when you study term re-writing, it doesn't take very long to realize that string re-writing, graph re-writing and term re-writing are all very similar in many ways, and so something like PEG, intended for strings, should have some kind of graph analog. Part two of the question is, of course: "would a PEG for graphs actually solve Habush's original problem?"

Anyway, those are the lines I'm currently thinking along, but I don't have any particular answer. Part of the problem is that Habush's original question is kind of ill-defined: atomspace operations never "return" anything; they just mutate the contents of the atomspace. So the original question failed to pose a constraint: after mutating the atomspace, what do you want the contents of the final atomspace to actually look like?

misgeatgit commented 4 years ago

We had some similar queries in Ghost for the VA project. Combining basic queries like cog-incoming-set and cog-outgoing-set together with BindLinks(creating them on the fly) in a recursive manner can be used as one of the solutions.

vsbogd commented 4 years ago

I'd like to have atomese stay true to it's roots as a KR language, or a https://en.wikipedia.org/wiki/Abstract_semantic_graph language. I don't want atomese to morph into a badly-designed functional programming language.

Atomese can be used to represent programs in a form of abstract semantic graph instead of AST and what is wrong with that? Of cause the goal is not inventing another programming language but represent program in a knowledge base.

Habush commented 4 years ago

Anyway, those are the lines I'm currently thinking along, but I don't have any particular answer. Part of the problem is that Habush's original question is kind of ill-defined

Okay let me illustrate my point by showing the scheme code I wrote for this specific example:

;;Given a concept node and a number this will recusively find the 0...(p-1)th parents of node
(define find-parents
    (lambda (node p) 
        (define parents 
            (let loop (
                [i p]
                ;;first find the immediate parents
                [ls (helper node)]
                [acc '()]

            )

            (cond 
                [(= i 0) (cons ls acc)]
                [(null? ls) acc]
                [else (cons 
                    (loop (- i 1) (helper (car ls)) (cons ls acc))
                    (loop i (cdr ls) '())
                    )]

            )
        )

        )

        parents
    )

)

(define helper 
    (lambda (node)
        (cog-outgoing-set (cog-execute! (GetLink 
          (TypedVariable (Variable "$a") (TypeNode 'ConceptNode))  
            (InheritanceLink 
                node
                (VariableNode "$a")
            )

        )))
    )
)

Using the above code if you run (find-parents (Concept "A") 2) will return the following pair

(((() ((ConceptNode "F")
 (ConceptNode "C")
) ((ConceptNode "B")
)) (((ConceptNode "D")
) ((ConceptNode "C")
))))

"How can one express a recursive pattern match, in a declarative way, such that the execution of that recursive pattern match results in a re-write of the contents of the AtomSpace?"

Yes, basically how can I express the above loop declaratively and get the same result.

Habush commented 4 years ago

Also imagine we have the following extra info in the same atomspace:

(Evaluation
    (Predicate "citizen")
    (ListLink 
        (Concept "F")
        (Concept "U.S")
    )

)

(Evaluation
    (Predicate "citizen")
    (ListLink 
        (Concept "B")
        (Concept "U.K")
    )

)

Now, in addition to getting the parents recursively, I also want to constrain the parents to be citizens of a country X. Modifying the above code to that end we have the following:

(define find-parents
    (lambda (node p country) 
        (define parents 
            (let loop (
                [i p]
                ;;first find the immediate parents
                [ls (helper node country)]
                [acc '()]

            )

            (cond 
                [(= i 0) (cons ls acc)]
                [(null? ls) acc]
                [else (cons 
                    (loop (- i 1) (helper (car ls) country) (cons ls acc))
                    (loop i (cdr ls) '())
                    )]

            )
        )

        )

        parents
    )

)

(define helper 
    (lambda (node country)
        (cog-outgoing-set (cog-execute! (GetLink 
          (TypedVariable (Variable "$a") (TypeNode 'ConceptNode))  
          (AndLink
            (InheritanceLink 
                node
                (VariableNode "$a")
            )
            (Evaluation
                (Predicate "citizen")
                (ListLink 
                    (VariableNode "$a")
                    country
                )

            )
          )

        )))
    )
)

The second part of my question (which I didn't clearly state on my original comment) is how can we also specify a constraint on the recursive search declaratively (like the above)?

linas commented 4 years ago

I'd like to have atomese stay true to it's roots as a KR language, or a https://en.wikipedia.org/wiki/Abstract_semantic_graph language. I don't want atomese to morph into a badly-designed functional programming language.

Atomese can be used to represent programs in a form of abstract semantic graph instead of AST and what is wrong with that? Of cause the goal is not inventing another programming language but represent program in a knowledge base

Nothing wrong with that. I'm ruminating on high performance and efficiency. The current AtomSpace is designed to traverse graphs with high performance. That's the one thing its good at. But this comes at a price: bulky Atoms, awkward API. Compare this to main-stream AST systems, and it looks perfectly absurd, crazy, untenable. Where, by "main-stream AST", I'm thinking GNU Lightning, or say, anything listed here: https://en.wikipedia.org/wiki/Intermediate_representation - I'm most familiar with GIMPLE and the LLVMIR. Part of what mkes an IR an IR, is that they have an extensive and very fancy, very sophisticated term-rewriting infrastructure (i.e. "BindLinks"), but they are optimized for traversing the AST trees downwards, only. The properties on the trees were semi-hard-coded (instruction names, register names, and so not general purpose, like atomese is) but they do have some general-purpose slots, like the Atom Values, for storing extended information (e.g. condition register info, interlocks, reservations, etc.). The gimple/llvm AST trees are optimized for extremely fast creation and disposal, and fast mutation under single threads (the re-write rules always run in a single thread, they always discarded the old version, after re-writing, or rather, they mutate it, in place.). By contrast, Atomese trees are immutable; the old version is always kept, a new one is created, and a hefty performance price is paid to stuff the new one into the atomspace index. In short, the performance of main-stream AST, and the performance of Atomese is very different, almost incomparable. No normal software engineer, in the process of inventing a brand new bytecode and IR, would choose Atomese for for the AST system. Atomese is too ... weird, for that.

Now, to contradict myself: if one was a compiler scientist, and one wanted to look at the IR generated by compiling 1000 different open-source projects, and one wanted to look at the statistical distribution of the AST's generated by the compiler, then, yes, actually, the AtomSpace might be the ideal, the perfect system for doing that. So sure.

linas commented 4 years ago

@Habush you write:

Using the above code if you run (find-parents (Concept "A") 2) will return the following pair

(((() ((ConceptNode "F") (ConceptNode "C") ) ((ConceptNode "B") )) (((ConceptNode "D") ) ((ConceptNode "C") ))))

Well, that is badly indented. Lets look at what you wrote, correctly indented:

(
   (
      (
         ()
         (
            (ConceptNode "F")
            (ConceptNode "C")
         )
         (
            (ConceptNode "B")
         )
      )
      (
         (
            (ConceptNode "D")
         )
         (
            (ConceptNode "C")
         )
      )
   )
)

which has almost no resemblance to what you originally wrote:

(Inheritance    (Concept "A")    (Concept "B"))
(Inheritance    (Concept "B")    (Concept "C"))
(Inheritance    (Concept "B")    (Concept "F"))
(Inheritance    (Concept "C")    (Concept "D"))

If I draw your input data as an ASCII-art tree, it would be

     A
     |
     B
    / \
   C   F
   |
   D

The most immediate, direct way of writing that ASCII-art as an S-expression would be (A (B (C (D) F))) or, in long-form

((Concept "A")
   ((Concept "B")
       ((Concept "C")
           ((Concept "D"))
        (Concept "F"))))

Compared to your output ... well, a comparison is difficult. Your code generated, for the first part

      (
         () 
         (  
            (ConceptNode "F")
            (ConceptNode "C")
         )  
         (  
            (ConceptNode "B")
         )
      )

so F and C are at the same level, and B is above that. But A is above B, where is A? And what is that extra nil doing? It looks like the '() is inheriting from both F and C? But there's no (Inheritance (Concept "F") (Concept "")) in your input data. So this is confusing. It looks like your algo inverted the tree, left-right, took some horizontal slices (which explains why C appears twice in the output), cut off the top (which explains why A is missing) and then added some extra stuff (the extra nil.)

Do you actually need this complicated structure? Can you explain the nil, and what it means?

There is also a meta-issue: The AtomSpace cannot store S-expressions directly. That is, you can't just store (P Q R) you have to store (Link P Q R), that is, for every open-paren, you have to write open-paren-link, so that for your output, it would be

(Link
   (Link
      (Link
         (Link)
         (Link
            (ConceptNode "F")
            (ConceptNode "C")
         )
         (Link
            (ConceptNode "B")
         )
      )
      (Link
         (Link
            (ConceptNode "D")
         )
         (Link
            (ConceptNode "C")
         )
      )
   )
)

Is this really the output that you want? And why is this output somehow easier to work with than, (for example)

(List (Concept "A")
      (List (Concept "B")
            (List (Concept "C")
                  (List (Concept "D"))
                  (Concept "F"))))

which is the direct S-expression version of your original InheritanceLink graph.... ?

linas commented 4 years ago

Here is a six-line-long scheme program to do a simple recursive DAG tree traversal, where the edges are InheritanceLinks. It's actually 19 lines, with documentation. If the InheritanceLink graph is not a DAG, this will go into an infinite loop!

; Traverse inheritance-link tree graph HEAD to depth DEPTH
(define (traverse HEAD DEPTH)
   ; If we've traversed to the end, but there are more
   ; branches below us, then say that.
   (if (eq? DEPTH 0) "unknown-branch"
      ; Otherwise, build a tree with HEAD at the top of the tree.
      (cons HEAD
         ; Walk over the all links of the form (Inheritance A B)
         ; and discard those where A is not HEAD. Keep only B.
         (filter-map
            (lambda (inh-lnk)
               ; Discard those where A is not HEAD.
               (if (not (equal? HEAD (gar inh-lnk))) #f
                  ; Recurse to get subtrees under B.
                  (traverse (gdr inh-lnk) (- DEPTH 1))))
            ; A list of all InheritanceLinks which have HEAD in them.
            (cog-incoming-by-type HEAD 'InheritanceLink)))))

Let's try it out:

(use-modules (opencog))
(use-modules (srfi srfi-1))
(Inheritance    (Concept "A")    (Concept "B"))
(Inheritance    (Concept "B")    (Concept "C"))
(Inheritance    (Concept "B")    (Concept "F"))
(Inheritance    (Concept "C")    (Concept "D"))

So:

scheme@(guile-user)> (traverse (Concept "A") 0)
$5 = "unknown-branch"

at level zero, A is unknown.

scheme@(guile-user)> (traverse (Concept "A") 1)
$6 = ((ConceptNode "A")
 "unknown-branch")

At level 1, A has just one unknown child.

scheme@(guile-user)> (traverse (Concept "A") 2)
$7 = ((ConceptNode "A")
 ((ConceptNode "B")
 "unknown-branch" "unknown-branch"))

At level 2, A has a single branch B, and B has two unknown children.

scheme@(guile-user)> (traverse (Concept "A") 3)
$8 = ((ConceptNode "A")
 ((ConceptNode "B")
 ((ConceptNode "C")
 "unknown-branch") ((ConceptNode "F")
)))

At level three, only C has an unknown child.

scheme@(guile-user)> (traverse (Concept "A") 4)
$9 = ((ConceptNode "A")
 ((ConceptNode "B")
 ((ConceptNode "C")
 ((ConceptNode "D")
)) ((ConceptNode "F")
)))

At level 4, the entire tree has been explored, there ae no unknown branches.

scheme@(guile-user)> (traverse (Concept "A") 5)
$10 = ((ConceptNode "A")
 ((ConceptNode "B")
 ((ConceptNode "C")
 ((ConceptNode "D")
)) ((ConceptNode "F")
)))

Just like level 4. Any DEPTH greater than 4 will just return the same tree as 4.

This time, lets generate valid Atomese:

(define (atraverse HEAD DEPTH)
   (if (eq? DEPTH 0) (Concept "unknown-branch")
      (List HEAD    
         (filter-map
            (lambda (inh-lnk)
               (if (not (equal? HEAD (gar inh-lnk))) #f
                  (atraverse (gdr inh-lnk) (- DEPTH 1))))
            (cog-incoming-by-type HEAD 'InheritanceLink))))) 

and verify:

scheme@(guile-user)> (atraverse (Concept "A") 2)
$12 = (ListLink
   (ConceptNode "A")
   (ListLink
      (ConceptNode "B")
      (ConceptNode "unknown-branch")
      (ConceptNode "unknown-branch")
   )
)

Finally, let's check citizenship:

(Evaluation (Predicate "citizen") (List (Concept "A") (Concept "U.K")))
(Evaluation (Predicate "citizen") (List (Concept "B") (Concept "U.K")))
(Evaluation (Predicate "citizen") (List (Concept "F") (Concept "U.S")))

(define (is-uk? ATOM)
   (and
      (not (equal? '() (cog-link 'ListLink ATOM (Concept "U.K"))))
      (not (equal? '() (cog-link 'EvaluationLink (Predicate "citizen")
          (List ATOM (Concept "U.K")))))))

(define (pred-traverse HEAD DEPTH PRED)
   (if (eq? DEPTH 0) (Concept "unknown-branch")
   (if (not (PRED HEAD)) (Concept "cut by predicate")
      (List HEAD 
         (filter-map
            (lambda (inh-lnk)
               (if (not (equal? HEAD (gar inh-lnk))) #f
                  (pred-traverse (gdr inh-lnk) (- DEPTH 1) PRED)))
            (cog-incoming-by-type HEAD 'InheritanceLink))))))

scheme@(guile-user)> (pred-traverse (Concept "A") 3 is-uk?)

$28 = (ListLink
   (ConceptNode "A")
   (ListLink
      (ConceptNode "B")
      (ConceptNode "cut by predicate")
      (ConceptNode "cut by predicate")
   )
)
linas commented 4 years ago

The point of the previous post is that you can do graph traversal just fine, using plain-old srfi-1 and some if-then-else work. You could also write equivalent code in python or C++, and it would be a little longer, but not much longer. And it would certainly run 10x faster than using the pattern matcher.

So, again, the pattern matcher is a device that takes the AtomSpace as the input, and mutates it into a new, different AtomSpace. What do you want the new, different AtomSpace to look like?

Note, by the way, that the following resemblance is NOT accidental:

(define (is-uk? ATOM)
   (and
      (equal? 'ConceptNode (cog-type ATOM))
      (not (equal? '() (cog-link 'ListLink ATOM (Concept "U.K"))))
      (not (equal? '() (cog-link 'EvaluationLink (Predicate "citizen")
          (List ATOM (Concept "U.K")))))))

and

 (GetLink
    (TypedVariable (Variable "$a") (Type 'ConceptNode))          
    (Evaluation (Predicate "citizen")
                (List (Variable "$a") (Concept "U.K"))))

They are both searching the atomspace for the same pattern. The first one performs the search in scheme, the second performs the search in the pattern matcher. I'm not sure which is faster -- it's probably a tie, to within a factor of 3x.

The main difference between these two is that the former, the scheme code, must be written by a human being, a programmer, trained in the art of software. The later, the GetLink can be written by an algorithm, e.g. by MOSES or by PLN or the URE or some other automated subsystem.

linas commented 4 years ago

There is an even easier way to do the above. First, a generic routine to walk generic DAG's:

(define (gtraverse HEAD DEPTH GET-TAIL)
"
   gtraverse HEAD DEPTH GET-TAIL

   Traverse a directed acyclic graph (DAG) that is rooted at HEAD
   up to DEPTH. The GET-TAIL function is used to follow the arrows
   of the graph. That is, given the head of a directed edge, the
   GET-TAIL function returns a list of the tails of all the edges.

   This converts from an implicitly coded DAG to an explicit one,
   using ListLinks.
"
   (if (eq? DEPTH 0) (Concept "unknown-branch")
      (List HEAD
         (map
            (lambda (tail) (gtraverse tail (- DEPTH 1) GET-TAIL))
            (GET-TAIL HEAD)))))

Next, three different examples of tail-getters:

; An example of a tail-getter. This traces InheritanceLinks.
(define (get-inh-tail HEAD)
   (filter-map
      (lambda (inh-lnk)
         (and (equal? HEAD (gar inh-lnk)) (gdr inh-lnk)))
      (cog-incoming-by-type HEAD 'InheritanceLink)))

Try it:

(gtraverse (Concept "A") 0 get-inh-tail)
(gtraverse (Concept "A") 1 get-inh-tail)
(gtraverse (Concept "A") 2 get-inh-tail)
(gtraverse (Concept "A") 3 get-inh-tail)
(gtraverse (Concept "A") 4 get-inh-tail)

How about the UK?

; Another example of a tail-getter. This traces InheritanceLinks,
; but discard those that don't live in the UK.
(define (get-uk-tail HEAD)
   (filter-map
      (lambda (inh-lnk)
         (and
            (equal? HEAD (gar inh-lnk))
            (is-uk? (gdr inh-lnk))
            (gdr inh-lnk)))
      (cog-incoming-by-type HEAD 'InheritanceLink)))

For the final example, I need a handy-dandy utility:

; A handy utility
(define (run-query QUERY)
"
   run-query QUERY

   Run the GetLink/BindLink QUERY. Unwrap and throw away the
   SetLink that it returned, and return the set contents as
   a scheme list.
"
   (define set-link (cog-execute! QUERY))
   (define answer (cog-outgoing-set set-link))
   (cog-delete set-link)
   answer)

and so the final example:

; A third example of a tail-getter. Uses the pattern matcher.
(define (get-tail-pat HEAD)
   (run-query
      (Get (Variable "$tail") (Inheritance HEAD (Variable "$tail")))))

The point of this last example is that you can now replace the GetLink by some arbitrarily-complicated search.

linas commented 4 years ago

And why stop there? We don't have to transform one kind of DAG into a ListLink DAG ... we can transform into some other format.

(define (dag-xform GET-TAIL MAKE-EDGE HEAD DEPTH)
"
   dag-xform GET-TAIL MAKE-EDGE HEAD DEPTH

   Traverse a directed acyclic graph (DAG) that is rooted at HEAD
   up to DEPTH. The GET-TAIL function is used to follow the arrows
   of the graph. That is, given the head of a directed edge, the
   GET-TAIL function returns a list of the tails of all the edges.

   The MAKE-EDGE function is then used to create a new DAG edge
   (presumably in some other, different format.)
"
   (if (eq? DEPTH 0) (Concept "unknown-branch")
      (MAKE-EDGE HEAD
         (map
            (lambda (tail) (dag-xform GET-TAIL MAKE-EDGE tail (- DEPTH 1)))
            (GET-TAIL HEAD)))))

Lets see if it works:

(dag-xform get-tail-pat ListLink (Concept "A") 2)

OK, so same as before. But instead of ListLink, we can try this:

(define (edge-maker HEAD LIST)
   (map
      (lambda (tail)
         (Evaluation (Predicate "foo") (ListLink HEAD tail)))
      LIST))

Try it:

(dag-xform get-tail-pat edge-maker (Concept "A") 2)

Or we can do it atomese-style:

(define (edge-putter HEAD LIST)
   (run-query
      (PutLink
         (Variable "$tail")
         (Evaluation (Predicate "bar") (ListLink HEAD (Variable "$tail")))
         (SetLink LIST))))

Notice how PutLink kind-of behaves like srfi-1 map -- this is not an accident, this is intentional.

linas commented 4 years ago

So, hang on a second, if PutLink kind-of behaves like srfi-1 map, then can't we write the whole thing in Atomese? Why yes, yes we can (almost) using the DefineLink (as I was doing this, I tripped over a bug that prevents the demo from going all the way through. And I am too lazy to fix this right now.) The details of how to do this were worked out by @ngeiswei a year or two ago or more. Basically, atomese has DefineLink which works kind-of-like the scheme define and it has a LambdaLink which works kind-of-like lambda and it has ExecutionOutputLink which works kind-of-like apply. Its all kind-of bulky and klunky to use so I don't recommend it. But here's a flavor of it all:

; The re-written DAG edges will look like this:
(define into-form
   (Evaluation (Predicate "yikes")
      (ListLink (Variable "$head") (Variable "$tail"))))

; A defined Lambda, in atomese.
(DefineLink
   (DefinedSchemaNode "make-an-edge")
   (Lambda
      (VariableList (Variable "$h") (Variable "$t"))
      (PutLink
         (VariableList (Variable "$head") (Variable "$tail"))
         into-form
         (List (Variable "$h") (Variable "$t")))))

; Lets try it out. Does it work? Yes.
(cog-execute!
   (ExecutionOutput
      (DefinedSchemaNode "make-an-edge")
      (List (Concept "X") (Concept "Y"))))

We also need the tail-getter:

; The input DAG edges that we search for
(define get-form
   (Inheritance (Variable "$head") (Variable "$tail")))

; Wrap it up in a function
(DefineLink
   (DefinedSchemaNode "get-the-tail")
   (Lambda
      (Variable "$head")
      (GetLink
         (Variable "$tail")
         get-form)))

; Does it work as expected? Yes.
(cog-execute!
   (ExecutionOutput
      (DefinedSchemaNode "get-the-tail")
      (List (Concept "A"))))

These can be composed "as usual"...

; Can we chain them together? Yes we can.
(DefineLink
   (DefinedSchemaNode "rewrite-one")
   (Lambda
      (Variable "$hd")
      (ExecutionOutput
         (DefinedSchemaNode "make-an-edge")
         (List
            (Variable "$hd")
            (ExecutionOutput
               (DefinedSchemaNode "get-the-tail")
               (List (Variable "$hd")))))))

; Does it work? Yes it does.
(cog-execute!
   (ExecutionOutput
      (DefinedSchemaNode "rewrite-one")
      (List (Concept "A"))))

Lets try recursion. This is where things break down; there's a bug in the atomspace code. I'm certain that variants of this, using EvaluationLinks and DefinedPredicateNodes do work correctly (they were used in the Hanson Robotics codebase), so I don't know why the DefinedSchema version does not work here.

; Do nothing.
(DefineLink
   (DefinedSchemaNode "no-op")
   (Lambda
      (VariableList (Variable "$hd") (Variable "$out"))
      (List (Variable "$hd") (Variable "$out"))))

; A complicated test.
(DefineLink
   (DefinedSchemaNode "test-rewrite")
   (Lambda
      (VariableList (Variable "$hd") (Variable "$out"))
      (ExecutionOutput
         (DefinedSchemaNode "no-op")
         (List
            (ExecutionOutput
               (DefinedSchema "get-the-tail")
               (List
                  (Variable "$hd")))
            (ExecutionOutput
               (DefinedSchemaNode "make-an-edge")
               (List
                  (Variable "$hd")
                  (Variable "$out")))))))

; WTF This does not work as expected; there's a bug in the atomspace code.
(cog-execute!
   (ExecutionOutput
      (DefinedSchema "test-rewrite")
      (List (Concept "A") (Concept "null"))))

Here's the recursive form:

; Define a recursive tree-walker
(DefineLink
   (DefinedSchemaNode "recursive-rewrite")
   (Lambda
      (VariableList (Variable "$hd") (Variable "$out"))
      (ExecutionOutput
         (DefinedSchemaNode "recursive-rewrite")
         (List
            (ExecutionOutput
               (DefinedSchema "get-the-tail")
               (List
                  (Variable "$hd")))
            (ExecutionOutput
               (DefinedSchemaNode "make-an-edge")
               (List
                  (Variable "$hd")
                  (Variable "$out")))))))

But it also doesn't work -- it goes into an infinite loop. Not sure why. There's a bug somewhere. DO NOT RUN the below, it will loop and you will have to kill -9 guile.

(cog-execute!
   (ExecutionOutput
      (DefinedSchema "recursive-rewrite")
      (List (Concept "A") (Concept "null"))))

so there are two-bugs: the no-op scheme didn't execute at all, and the recursive-rewrite executes too much. Bummer.

I left out the decrement-the-depth code, just to keep it simple. You can add it back in, using GreaterThanLink and MinusLink and NumberNode ...It's more or less the same as writing scheme, but in Atomese, which means that it's more verbose, more klunky, hard (for humans) to read. It mostly all works, there's a dozen or so unit tests for this kind of stuff, but apparently not for DefinedSchemas.

ngeiswei commented 4 years ago

@Habush the following example

https://github.com/opencog/pln/tree/master/examples/pln/ancestors

shows how to use the PLN module to solve your problem with a few lines of code.