Reuse matched execution/evaluation links in bindlink

noskill commented 5 years ago

My proposal is to add new flag to bindink function to reuse matched execution/evaluation links in pattern matcher search:

Example:

(add-to-load-path ".")

(use-modules (opencog))
(use-modules (opencog query))
(use-modules (opencog exec))

(Inheritance (Concept "red") (Concept "color"))
(Inheritance (Concept "blue") (Concept "color"))
(Inheritance (Concept "green") (Concept "color"))

(define (schema2 A B)
    (display "Executing schema2 with ")
    (display (list (cog-name A) (cog-name B)))
    (newline)
    (if (equal? (cog-name A) (cog-name B))
        (Concept "Same color")
        (Concept "Different color")))

(define (classify A)
    (display "Executing predicate with ")
    (display (cog-name A))
    (newline)
    (if (equal? (cog-name A) "Same color")
        (stv 1 1)
        (stv 0 1)))

(define bind-link
    (BindLink
        (VariableList
            (Variable "$X")
            (Variable "$Y"))
        (And
            (Inheritance
                (Variable "$X")
                (Concept "color"))
            (Inheritance
                (Variable "$Y")
                (Concept "color"))
            (Evaluation
                (GroundedPredicate "scm: classify")
                (ExecutionOutput
                    (GroundedSchema "scm: schema2")
                    (List
                         (Variable "$X")
                         (Variable "$Y"))
                )
            ))
        (ExecutionOutput
            (GroundedSchema "scm: schema2")
                (List
                   (Variable "$X")
                   (Variable "$Y")))
        )
)
(define result (cog-execute! bind-link))

Here we want to return from bindlink exactly what was matched and used in AndLink matching. Current implementation would recompute the ExecutionOutput link.

One way to implement it is to treat GroundedSchema as deterministic function and storing evaluations into temporary atomspace. That would allow for usage of many heuristics for reducing the search space: like minimum-remaining-value heuristic.

noskill commented 5 years ago

see also my arguments on caching vs deterministic grounded schema nodes https://github.com/opencog/atomspace/issues/1907#issuecomment-439340307

linas commented 5 years ago

Seems reasonable. However: unless there is a current, actual need for this, right now, I suggest that its a poor idea to implement this right now. Why?

It adds implementation complexity to an already-complex system.
It incurs extra overhead costs for all users, during both the pattern compilation and the pattern runtime steps, that makes all users run slightly slower. If there is no one user who will run a lot faster as a result, the change is not worth it.
Users with inefficient grounded schema do have a work-around, so this is not a critical change. That is, they can get a comparable performance boost in other fairly simple ways.

So: are there use cases, e.g. in PLN, where this would make a difference?

ngeiswei commented 5 years ago

@noskill, are you suggesting to add some optional argument in cog-execute!, something like

(cog-execute! my-bind-link #:cache #t)

?

The other thing I had suggested, besides having the caching take place within the grounded code, is to define a CachedGroundedSchemaNode type and implement a generic cache within its factory.

ngeiswei commented 5 years ago

Or more abstractly a StatelessGroundedSchemaNode (as @linas might have suggested, IIRC).

ngeiswei commented 5 years ago

@linas said

It adds implementation complexity to an already-complex system.

Agreed, but it's also I think a fairly reusable functionality.

It incurs extra overhead costs for all users, during both the pattern compilation and the pattern runtime steps, that makes all users run slightly slower.

Why? Such feature should be optional. It would only be costly if enabled, right? Or maybe you mean in case it is the default. Then yes, users who do not want this feature will pay the extra cost. I would bet that most users who do not want caching care more about its effects than its cost though. Think for instance of a pattern composed of multiple random generator calls, caching then makes a big difference in behavior.

Users with inefficient grounded schema do have a work-around, so this is not a critical change. That is, they can get a comparable performance boost in other fairly simple ways.

Agreed as well.

All to say, I'm fairly neutral on that issue, on one hand it seems like something many users would want, on the other it might not be the most critical thing to add ATM. I'd be totally open with adding a StatelessGroundedSchemaLink and StatelessGroundedPredicateLink, it's pretty clear to me this would be used time and time again.

Their could probably be a StatelessLink that any stateless specialized link could inherit. It would lead to an elegant implementation. Now maybe one doesn't want to equate stateless to caching, so maybe some extra option might want to be introduced, I don't know. I don't might equating stateless to caching and see how it unfolds.

vsbogd commented 5 years ago

I have got another idea of implementing it, may be it was already evaluated. In regex queries one can specify the part of the query to return in the result, I mean something like s/(.+)/\1/ where \1 returns first matched group (.+).

In bindlink in query clauses we could have an atom to save results of the calculation in a variable and then use this variable in rewriting clauses to write exactly what was calculated during query matching.

(define bind-link
    (BindLink
        (VariableList
            (Variable "$X")
            (Variable "$Y")
            (Variable "$R"))
        (And
            (Inheritance
                (Variable "$X")
                (Concept "color"))
            (Inheritance
                (Variable "$Y")
                (Concept "color"))
            (Evaluation
                (GroundedPredicate "scm: classify")
                (Let
                    (Variable "$R") 
                    (ExecutionOutput
                      (GroundedSchema "scm: schema2")
                      (List
                           (Variable "$X")
                           (Variable "$Y")))
            ))
        (Variable "$R")
        )
)

noskill commented 5 years ago

I misunderstood when recalculation of evaluation link happens. So there are cases: 1) In bindlink rewrite clause 2) During consecutive calls by rule engine

2 is addressable by external caching 1 is addressable by Vitaly's proposal or by stateless/caching links.

linas commented 5 years ago

There already is an equivalent of Let, its called StateLink. However,

1) No clue what happens if you use it in a BindLink like that; it might mis-behave.

2) StateLink is global -- it always lives in the atomspace, and everyone sees it. By contrast, I assume the LetLink goes out of scope, once the search is done. Thus, StateLink cannot be safely used in parallel searches in different threads.

Overall, I think I like the idea of a locally-scoped LetLink. See below.

https://wiki.opencog.org/w/StateLink

linas commented 5 years ago

Answring @ngeiswei question:

extra overhead costs for all users

Let's assume that there is no LetLink. Then during pattern compilation, we'd have to look at every clause, look for every grounded schema, then look at every other clause, find any schema there, and then try to decide if it is of the same kind, or if its different. If it's the same kind, then, somewhere in class Pattern, we record this (in some std::map, for example), with the std::map acting as a private, internal let to hold the two (or more) locations. The above search happens even if there is no grounded-anything in the pattern. Its a small overhead, but its still overhead. (We still don't have a BindLink benchmark...)

Next, during pattern execution, every time a Grounded-something is encountered, then that std::map in class pattern has to be consulted, to see if that Grounded-something needs to be memoized. The std::map has to be consulted every time, even if it's empty.

linas commented 5 years ago

OK, now please join me on the following wild ride.

It seems like the LetLink is a good idea. It is easy to implement. Since it is of local scope, it can be safely used if performing searches in parallel in multiple threads. Note that, since it is purely local, it only has to memorize one thing; it does NOT have to memorize N things for N values! This makes it simple, fast, low-cost, with no risk of exploding RAM usage (cache misses, etc.)
Since a LetLink is locally-scoped, it cannot be used across different terms in the rule engine.
StateLink is globally scoped. However, it can only remember one thing at a time, and has no awareness of "arguments", or execution-anythings, or Grounded-anythings.
So, let's imagine a CacheLink which might look like the following:

(CacheLink
      (VariableList (Variable $X) (Variable $Y))
      (ExecutionOutputLink
             (GroundedSchema "py:foobar")
             (List (Variable $X) (Variable $Y))))

The semantics of this is that, if the return value for X,Y is already known, then it is returned. If it is not known, then the GroundedSchema would be executed, and the resulting return value would be memorized.

The above sounds like a really great idea, until you realize that, if this was ten years ago, Ben would be telling us that this is the whole raison-d'etre for the atomspace. The atomspace is that cache. We're not supposed to invent yet-another-cache; the atomspace is the one-cache-to-rule-them-all. So, ten years ago, this is how it was imagined to work (there are difficulties and issues, below, but let me ignore them for now).(what follows is from one of Ben's books, ten years ago, as I remember it, after skimming it.)

So: every time that a GroundedSchema is evaluated, we are supposed to put an ordinary schema in the atomspace. Viz. every time we run

        (ExecutionOutputLink
             (GroundedSchema "py:foobar")
             (List (Concept "banana") (Concept "apple")))

we are supposed to create the following, in the atomspace:

        (ExecutionLink
             (Schema "foobar")
             (List (Concept "banana") (Concept "apple"))
             (Concept "banapple"))

And also the converse: every time that we encounter a GroundedSchema, we are supposed to first look to see if there already exists an ordinary ExecutionLink that already holds the desired answer, and use that cached value, first, and only call the GroundedSchema if the ExecutionLink does not exist. See https://wiki.opencog.org/w/ExecutionLink

Now, for some critiques and comments.

A lot of stuff has happened in ten years. Back then, using ExecutionLinks to cache the results of ExecutionOutputLinks made sense. Back then, the pattern-matcher did not exist. Question: does this old idea still make sense? Or is it a bad idea?
I kind-of-like the CacheLink idea. But if atomspaces are caches, maybe we should formalize: rename CacheLink to AtomSpaceLink? That way, we really can have multiple atomspaces in a controllable fashion. Question: what are the details of making this work? (Please note; there are other reasons for having an AtomSpaceLink; it simplifies certain distributed-processing scenarios.)

vsbogd commented 5 years ago

I would separate following issues: (1) how to return the same thing which was calculated by ExecutionOutputLink during pattern matcher search (this issue) (2) how to not calculate ExecutionOutputLink with the same parameters twice if it is required (issue https://github.com/opencog/atomspace/issues/1907)

There are solutions for the (2) which also solve (1). For instance AtomspaceLink if I got the idea right. (my understanding AtomspaceLink will cause adding ExecutionOutputLink results to the atomspace in form of ExecutionLink).

I think CacheLink is not declarative enough to be a part of atomese. LetLink which I meant should not cache but just designate some part of the query to make sure the same part of query is used in few different places. And it cannot be used to solve problem (2), it solves only (1). Yes, it is more similar to StateLink.

vsbogd commented 5 years ago

One of the side effects of the AtomspaceLink I see is that it adds ExecutionLink to atomspace each time it is calculated but not only when all clauses matches. @linas do you mean that AtomspaceLink within pattern matcher clause will keep results in internal pattern matcher fork of the atomspace?

Looking at ExecutionOutputLink wiki I have found that ExecutionLink can be used to calculate the result of the ExecutionOutputLink and return it as new ExecutionLink instance. It sounds similar to AtomspaceLink as I understand it.

ngeiswei commented 5 years ago

@vsbogd as far as I understand it, AtomspaceLinkwould insert the atom in a designed atomspace, not necessarily the existing one. The advantage of inserting in a new atomspace is that it would be really fast to retrieve and wouldn't clobber the existing atomspace(s).

ngeiswei commented 5 years ago

The following issue mentions using ExecutionLink while executing ExecutionOutputLink #1795.

ngeiswei commented 5 years ago

BTW, there is no ExecutionLink::do_execute() as mentioned in https://wiki.opencog.org/w/ExecutionOutputLink#Execution I think @linas means ExecutionOutputLink::do_execute.

linas commented 5 years ago

@vsbogd yes, separating this into two issues is a good idea. I do like LetLink because it seems simple, "obvious", not hard to code up, and has a minimal performance impact on non-users. So if you really really want to have it, go for it. It seems harmless. (I cannot resist a mean joke at this point. Or a warning: Nil will find some way of combining it with a Quote/UnquoteLink and it will drive you crazy.)

The second part is this muddy mish-mash of Execution vs. ExecutionOutput vs CacheLink vs. AtomSpaceLink and it is quite unclear exactly how these should work and interact. When I start thinking about the details, I get a mess that I don't quite like. There are unpleasant side-effects, unpleasant implications. I could write a long post where I ponder the pros and cons of different implementations for "all this". But it would be hard to write, and maybe not-fun to read. So I encourage: maybe you should think about the best way to do something like that, and what it would mean. Then we can all compare notes?

linas commented 5 years ago

I mean, I could spend a day or two or more, pondering the question of Execution vs. ExecutionOutput vs CacheLink vs. AtomSpaceLink and how that would all work. And perhaps, after ten years of futzing with other things, the time has come for this. ... Hmmm.

vsbogd commented 5 years ago

Below is my attempt to understand from the very beginning whether pattern matcher should treat ExecutionOutputLink as a function without side effects or not.

I can see three cases:

ExecutionOutputLink is like arithmetic operation has no side effects and returns same result each time
ExecutionOutputLink learns doing something and returns more and more precise result each time it is called
ExecutionOutputLink is like measuring temperature (or another environment condition) which may differ from measurement to measurement.

Only case which is really broken by unconditional caching of ExecutionOutputLink results is case (2). But it is questionable whether case (2) makes sense as many searches can be performed to improve result instead of making many calls during one search.

Natural solution for the case (3) is doing measurement before search add results to the atomspace and use them in search query. Same technique can be applied to make search in case (1) more effective. If ExecutionOutputLink parameters has variables which should be grounded during the search then query could be splitted on two: first to ground variables; execution ExecutionOutputLink; second search to find result of original query.

In other words I don't see the reasons for pattern matcher to treat ExecutionOutputLink as function which has side effects if pattern matcher performs a search. Doing so means that each time search returns another results. But it makes sense only for case (2) and it can be solved by using more searches. May be I don't quite understand the set of problems which pattern matcher solves.

I didn't thought about performance here. I would like to understand whether we have an use case where ExecutionOutputLink used within pattern matcher query do have side effects by design.

ngeiswei commented 5 years ago

I didn't thought about performance here. I would like to understand whether we have an use case where ExecutionOutputLink used within pattern matcher query do have side effects by design.

The only useful case can think of is when calling the same random generator at different places of the clauses. I've never needed that but I think @linas has when experimenting with controlling Sophia.

I can think of scenarios where every time some schema is executed it increases the strength of some TV, and some virtual link also depends on that TV, then it would make a difference, but do we need that?

Generally speaking stateless is preferable when possible, I think.

If it were to become the default I don't think it would negatively impact my work for instance, but that may differ for others.

I like the LetLink idea.

I also like the StatelessLink idea and I don't see anything detrimental about it. Surely the overhead would be negligible for non-users. On top of that if it speeds up stateless usages (which is probably most cases) then it's a win. Not even arguing about it being the default, appending Stateless in front of a link type is not much to ask anyway.

linas commented 5 years ago

To answer @vsbogd question: ExecutionOutputLink is supposed to call the GroundedSchemaNode, which causes scheme or python code to run. It is impossible to know what that code might do, so we have to assume it can do anything, including turning on the lights in a room on the other side of the planet.

In the distant past, the Sophia robot has made extensive use of GroundedSchemaNode and GroundedPredicateNode to attach to ROS interfaces to both sensory inputs, and to motor outputs. Sensory inputs are things like "face 42 is now visible", and outputs include things like "smile and blink three times". The ROS interfaces are not in the opencog repo, mostly because I did not want to make ROS a pre-req for building opencog. I'm not sure what repo it is in, and I am not sure if GroundedSchemaNode is still used (but it probably is?)

I don't think the robot code uses GroundedSchemaNode in patterns (but I could be wrong; the ghost people would know for sure). Mostly, the ExecutionOutputLinks were used with PutLink; so we would use GetLink (i.e. the pattern matcher) to figure out if the robot should smile and frown, and then use PutLink ExecutiounOutputLink to send the smile/frown command to the robot.

linas commented 5 years ago

Also: some of the demos explicitly show stateful behavior. One example involves a drag-race light-pole: three blinks of red, one blink yellow, one blink green. I don't recall if this uses GroundedSchema or GroundedPredicate. Another example builds a finite state machine in atomese. Both examples are also unit tests.

vsbogd commented 5 years ago

@linas, yes, sure GroundedSchemaNode can call some stateful python code. And I have tried to cover your example with Sophia (when GroundedSchemaNode returns the state of Sophia's sensors) in my comment above as case (3).

The question is whether it is acceptable for pattern matcher functionality to pre-cache the result of GroundedSchemaNode before running query. My opinion it is acceptable because pattern matcher query is not some endless process. It gets an input and returns result. When the state of the environment changes system uses another call to pattern matcher in order to get new result after the change. And this processing happens out of pattern matcher implementation.

I remember finite state machine example I will look at it closer.

linas commented 5 years ago

Yes, it is acceptable for the pattern matcher to cache the result of GroundedSchemaNode during the query.

To "pre-cache before the query" doesn't make sense: there are presumably one or more variables, and you can't know what values those variables might take, unless you run the query.

linas commented 5 years ago

Caching anything "before the query" brings us back to the tangle of ideas about recording (saving) results of ExecutionOutput into ExecutionLink and/or some kind of CacheLink and/or some kind of AtomSpaceLink.

The reason that it is a tangle is that there are no clear-cut use cases to aid thinking about how these should work. I don't want to invent a solution that is looking for a problem to solve. The initial example from @noskill is clear-cut, and seems to have a fairly nice, clear-cut solution with LetLink but it's not a strong enough example to drive deeper, more complex changes.

ngeiswei commented 5 years ago

@linas, you said

Yes, it is acceptable for the pattern matcher to cache the result of GroundedSchemaNode during the query.

but you also said

but it's not a strong enough example to drive deeper, more complex changes

So what is the conclusion? Are you OK with modifying the pattern matcher to cache results within a query (not across queries, we all agree on that)?

I'm asking because in that case I don't think LetLink is necessary anymore. I think within-query caching is actually the simplest thing to implement, no need to introduce LetLink or StatelessLink and I don't anticipate problems with it. The only problem I can think of is wanting to reuse say the same random generator across different clauses within the same query. It's not a problem for me but I wonder if it would be a problem for someone else...

linas commented 5 years ago

Are you OK with modifying the pattern matcher to cache results within a query

These were my concerns. Two of them. One was is that I'm pretty sure this will break existing unit tests and existing examples, e.g. the red-light, green-light demo. A second reason is that I think it is computationally expensive to figure out what can be cached when, and where. The reason I like LetLink is because it removes the computational overhead. It tells you exactly what is allowed to be cached.

Maybe I'm wrong about both. There are many stateful GroundedPredicate demos and unit tests. Maybe none of them are inside of BindLink/GetLink. I don't recall. So maybe this chage can be made, without breaking unit tests. I don't know.

Re; computational complexity: perhaps the implementation can be trivial. Maybe the results are already cached in the c++ structures in PatternLink and/or the patternMatchEngine. If they are not yet there, I can imagine that maybe they are easy to add. So maybe this is actually easy to implement. So, go try it. If it's easy, do it.

Re: random generator its got nothing to do with that. For the robot, we use GroundedSchema's for obtaining the locations of people, interfacing to the face-recognition subsystem, etc. which are very definitely stateful. However, they can be treated as being stateless, within a single run of the pattern matcher.

vsbogd commented 5 years ago

One was is that I'm pretty sure this will break existing unit tests and existing examples, e.g. the red-light, green-light demo.

As far as I know @noskill looked in this example recently may be he can comment on this particular case. But I agree that implementing it may break the unit tests and probably we can try to find out examples which require respecting statefulness.

A second reason is that I think it is computationally expensive to figure out what can be cached when, and where.

Agree, that it should be taken into account.

For the robot, we use GroundedSchema's for obtaining the locations of people, interfacing to the face-recognition subsystem, etc. which are very definitely stateful. However, they can be treated as being stateless, within a single run of the pattern matcher.

Thanks, it is one the things I would like to know your opinion about.

linas commented 5 years ago

they can be treated as being stateless, within a single run of the pattern matcher.

Thanks, it is one the things I would like to know your opinion about.

I would like to think of a single run of the pattern matcher as being very much like a single, indivisible, atomic run where time stops, everything is frozen, nothing changes, and then a result is magically produced, and time starts running again. In this sense, it should be a lot like a database query, and so should resemble the ACID style, as much as possible. At this time, I think I want to avoid an ACID vs. BASE debate, at least, in this issue (we can have a different issue for an ACID vs. BASE debate). Ground rules would be to read the Microsoft paper that explains that NoSQL is actually coSQL (viz. opposite==direction of all arrows are reversed.) For opencog, the debate is harder, because we have arrows going in both directions, so taking the co- to get BASE is confusing. So we cannot, right now, have perfect ACID or BASE. Like I say, this is a different, confusing debate.

linas commented 5 years ago

time stops

This could literally be enforced, by taking a global lock on the atomspace, thus preventing atoms from being inserted or deleted, until the query is finished. However, this would be a rather extreme step at this point; I would not want to take it. (locks prevent parallelism) So this is an example where BASE ideas are nicer.

opencog / atomspace

Reuse matched execution/evaluation links in bindlink #1911