Replace AtomSpace with AtomSpaceLink

opencog / atomspace

The OpenCog (hyper-)graph database and graph rewriting system

https://wiki.opencog.org/w/AtomSpace

Other

819 stars 232 forks source link

Replace AtomSpace with AtomSpaceLink #1967

Closed linas closed 1 year ago

linas commented 5 years ago

This is an old idea, but without it's own issue. Time to describe it more carefully. This would resolve issues #1855 and #1921 This depends on #1502 being implemented first.

Proposal: Implement a new kind of very special link, called an AtomSpaceLink, that behaves a lot like a MemberLink. Move Values from Atoms to AtomSpaceLinks. Design pros and cons follow.

Why?

There is a generic need to limit some operation to some smaller set of atoms. Examples include the AttentionBank, the "focus set" during inference (issue #1984), and the MapLink inputs. (https://wiki.opencog.org/w/MapLink)
The above is currently handled in different ad hoc ways. The "focus set" is kept in C++ code, and is passed around with C++ pointers in an adhoc way. There is only one focus set (because there is only one attention bank). Meanwhile, the MapLink inputs are kept in a SetLink. Temporary inference results are kept in temporary atomspaces....
Having a generic mechanism to manage multiple "focus sets", or "atomspaces" or "SetLinks" would be beneficial.
Having a generic mechanism that is fast, easy-to-use, easy-to-understand, and avoiding crazy side-effects would be best!

Pros:

By moving values to the AtomSpaceLink, the bungled incoming set issue of #1921 is solved. Atoms become, once-again, globally unique.
The AtomSpaceLink becomes a kind-of ContextLink, and so contextualized TV's become "easy to handle" (viz the sky is blue has a big truth value on earth, a small one on mars)
Confusion about nested atomspaces, and multiple atomspaces is lessened. Atomspaces no longer have to be strict nestings; they can intersect in non-trivial ways.
Data sharing, i.e. making some atomspace be read-only, while other, smaller one read-write, are simpler. The agi-bio/genomic people want this; see issue #1855
MapLink/BindLink can be unified. So can GetLink/FilterLink. No need for two similar-but-different code bases to do almost the same thing.
@ngeiswei claims that this enables an "efficient URE-based pattern miner", in a comment in #1502
The set of all AtomSpaces, and their relationships to one-another, becomes a Kripke frame (a modal frame) Thus, each atomspace is a "possible world", and a FrameLink describes accessibility (the FrameLink replaces the current parent-child relationship in atomspaces).
This allows atomspaces to become smaller, and also more numerous; this allows atomspaces to be modal, and more directly represent "beliefs". This allows atomspaces to be more easily used as "temporaries", holding half-finished proofs, half-finished chains and "backward inference trees" (BIT) is that a BIT is just a Kripke frame.

Cons:

Performance, performance, performance. Access to values (truth values) on atoms is currently very fast; how can we keep this?
A naive implementation of AtomSpaceLink will bloat the atomspace with pointless RAM bloat. It needs to somehow become more virtual, in the way that GreaterThanLink is virtual...
Having lots and lots of atomspaces, (large and complex Kripke frames) might confuse pattern matching!? And maybe have a negative performance impact on it?

linas commented 5 years ago

Here's a sketch of one possible implementation:

Create a C++ class AtomSpaceNode. It would be a wrapper around the existing class AtomSpace. It inherits from class Node so that its a real, valid node.
The AtomSpaceNode "hash" would be the class AtomSpace:uuid
Create a C++ class AtomSpaceLink. It inherits from class Link so that its a real, valid link.
Remove Values from Atoms. Place Values on AtomSpaceLinks.
Remove the AtomSpace* from class Atom; replace it by a weak pointer to class AtomSpaceLink. This becomes the "first" or "primary" or "default" atomspace for that atom. Other atomspaces can be found via the incoming set (no modification needed to the incoming set to support this)
Provide backwards compatibility for Atom::get_value(), etc. by chasing the weak link. ** Alternative: keep values in a map on the atom; use the atomspace to index into the map.
Create a FrameLink that denotes "child" atomspaces. Use it to find child atomspaces, as needed. i.e. rip out all the child-atomspace code, replace it by a lookup via Frames.
In order to fix #1921 we need to keep a single, global std::set<Handle> that can locate any existing atom, no matter what atomspaces it might be in. (Actually std::unordered_multimap<ContentHash, Handle> instead of std::Set<> for efficiency).

This seems like the simplest, most straight-forward implementation, and should not be hard. It seems like it will definitely use more RAM than the current design. It will slow down the default TV access by a "little bit" (how much?)

How, exactly does pattern matching work? It would be a real performance bummer if atomspace membership has to be checked for every single atom during a pattern search. Maybe there could be a way of doing a high-speed "get incoming-set-by-atomspace?" That would solve the problem!?

ngeiswei commented 5 years ago

Remove Values from Atoms. Place Values on AtomSpaceLinks.

@linas, could you expand a bit? I understand you'd want the same atom to hold different value per atomspace, but I don't see how values would be accessed, and at what cost.

Wouldn't it be better to have values remain on atoms but wrapped in a map AtomSpace -> value ?

BTW, this is reminiscent from the old ContextLink, maybe we want to resurrect/improve ContextLink instead of introducing AtomSpaceLink... Just an idea...

linas commented 5 years ago

could you expand a bit?

Is there a specific bullet-point that you'd like more info on? Up top is a list for why this seems like a good idea -- it seems to solve a number of different design issues, all with just "one weird trick". I can explain the issues in greater detail...

better to have values remain on atoms but wrapped in a map AtomSpace -> value

Yes, maybe. I'm adding that to the sketch above. I haven't tried to implement this, because of various uncertainties like this.

ContextLink ... resurrect

We can call the new thing "ContextLink"; I was definitely thinking of that while writing this proposal. As to resurrecting ancient code ... noooo! It was horrible, terrible code, and besides, everything has been redesigned maybe three or four times since that code was removed...everything is now completely different. (Actually, it was called ContextualTruthValue, not ContextLink.)

ngeiswei commented 5 years ago

We can call the new thing "ContextLink"

Actually, we probably shouldn't call it ContextLink because it already has a defined PLN semantics (it is almost like an ImplicationLink or InheritanceLink but different). I think the term AtomSpaceLink is right. Then if it turns out ContextLink and AtomSpaceLink can be unified into something more elegant we can do it later.

linas commented 5 years ago

it turns out ContextLink and AtomSpaceLink can be unified into something more elegant we can do it later.

I am not planning on implementing this any time soon; I have a backlog of unfinished work. I've been thinking about this for maybe a year now, but for some reason, there was no distinct github issue for this. This is part of the elminate-SetLink and the distributed-atomspace on-top-of some-other graph-DB discussions; they're all interconnected.

linas commented 5 years ago

Edit: so if you want to take some time to explain clearly what ContextLink is, what it should do, this is not a bad time.

ngeiswei commented 5 years ago

The PLN book definition of ContextLink is

ContextLink <TV>
   C
   R A B

is equivalent to

R <TV>
   (A AND C)
   (B AND C)

However it's not true for all R. It works if R is an Implication or Inheritance or such, but doesn't necessarily work if R is say an Evaluation. I suspect one needs to assume that the predicate is https://wiki.opencog.org/w/Soggy_Predicates or something like that.

Anyway, since ContextLink is not currently used I haven't felt the need to get to the root of it yet.

linas commented 5 years ago

Hmm. That is over-specific, over-constrained. I'm contemplating:

ContextLink <any-value>
    C
    A

is equivalent to  .... umm, yeah, equvalent to, how shall I explain it:

    (A AND C) <any-value>

that is, 

AndLink <any-value>
    C
    A

except more like

MemberLink <any-value>
    C
    A

Hopefully obvious here is that the concepts of "indicator function", "set membership", "set intersection" and "logical-AND" are all kind-of different notations for saying the same thing. So, in a way, I'm trying to demote the atomspace into being "just another set", so that AtomSpaceLink is a lot like a MemberLink.

But the atomspace is not really "just another set", it is a universe of all things, but only one universe out of possibly many. That means that it is a universe in a kripke frame. See https://en.wikipedia.org/wiki/General_frame -- so using the notation of that article, GF=<F,R,V> is a general frame, F is the set of all atomspaces (the set of all contexts), R is the set of rules in the rule engine, and V is the set of values. (Yes, I am violating the strict definition given in that WP article; I'm trying to go for the intended meaning). (Part of the intended meaning is that a "context" is just the local universe in which things currently hold, so that roughly, a context, and a local atomspace are the same thing).

I'm less worried about the mathematical preciseness of this, or the need to stick to some specific proof theory; rather I'm just looking for a good, efficient, direct API to multiple atomspaces that resolves the various technical issues we've had, while also providing a good setting for proof theory in general. So I'm willing to say that an AtomSpceLink is kind-of-like a MemberLink is kind-of-like a ContextLink, is kind-of-like a frame, etc. as long as the final result is eventually morphs into a --- good, efficient, direct API to multiple atomspaces that resolves the various technical issues we've had.

linas commented 5 years ago

Here's maybe another way I'm trying to think of this. In some variants of the backward chainer, you had these BIT things (backward inference trees? some kind of inference trace?) and, according to my understanding of proof theory, each node in that tree corresponds to a , umm "judgement" in natural deduction https://en.wikipedia.org/wiki/Natural_deduction or, rather, that subset of the atomspace that is relevant at that particular point in time, for that context of things that have been introduced. And I'm calling the set of those things a "kripke frame" because they are the set of things that one could possibly infer, given that one has only taken N steps of inference so far. Yes, I am horribly mangling the terminology. And I'm intentionally confusing natural deduction with https://en.wikipedia.org/wiki/Sequent_calculus The reason for the intentional mashup is to make something generic enough allow all these variants at once (e.g. to allow BIT trees to be efficiently stored, without needing a new C++ structure for them) but mainly to resolve issues #1855 and #1921 with #1502 as a pre-req.

So that https://wiki.opencog.org/w/FilterLink and GetLink become "the same thing" Or perhaps, more accurately, https://wiki.opencog.org/w/MapLink and GetLink become the "same thing" So that (looking at the MapLink wikipage), if one replaces the SetLink by the set of all things in an atomspace, the MapLink just becomes the same thing as a GetLink.

Put it a different way: its somehow clear that we've done the concept of "set" wrong, and this is an attempt to fix this, to turn atomspaces into sets or into contexts.

ngeiswei commented 5 years ago

BTW, most of the complexity of the BIT code goes into sticking rules to inference trees. The Back Inference Tree itself is merely a population of inference trees, i.e. BindLinks. It has some caches as well to avoid reapplying rules etc, which could probably be replaced by Value. So overall I think it shouldn't be hard to have most or all of the data structure in an AtomSpace.

linas commented 2 years ago

FYI, Some of what is suggested here has been implemented:

AtomSpacePtr is a c++ smart pointer, so AtomSpaces are now reference counted and safe from accidental deletion.
class AtomSpace now inherits from class Atom, and so AtomSpacePtr is a special case of Handle.
Because of this, you can put lists of AtomSpaces into a HandleSeq.
You can even create a Link with AtomSpaces in it's outgoing set. However, at this time, this is not supported, and if you try to insert such a Link into stome other AtomSpace, it will be rejected (or ignored? I forget.)
The above is unsupported, because the use-case for this is unclear.

The above bullets are distinct from what was proposed in this issue. What was proposed was that there would be another class, very distinct from the current AtomSpace in design, that would act as a "wrapper" around an Atom. Such a wrapper does seem to solve some design problems.... the biggest problem is that such a wrapper would seem to chew up more RAM than the current design.

linas commented 1 year ago

Closing. Frames have been implemented. There does not (at this time) seem to be any performance advantage from detaching Values from Atoms; there are several ways in which this seems to make things slower. I've been pondering this idea for years, and it is just not working out.

BTW, as to BIT I assume that frames can handle 80% of what BIT does. Note also there is now a UnifierLink that wraps the unifier, and also a RuleLink that is "just like BindLink but without the setup/static-analysis overhead" and so I think that some large fraction of what URE does, both forward and backward chaining, can now be done in "pure Atomese", using the UnifyLink with the RuleLink (see one of the demos), and then using the Frames to store intermediate results. Can even mix forward and backward chaining. What is mixing would be any kind of weighting to halt exploration of branches that are too low in importance.

I am contemplating creating a CacheProxyNode that would be a modernization of the idea of ECAN. The ProxyNode infrastructure makes this "real easy to do". Also, it's now "trivial" to use values coming from any FloatValue, instead of having to use AttentionValues. Any formula can be attached to this (so not plain ECAN, but anything you can write with PlusLink, etc.)

I think I need the CacheProxyNode to manage memory during learning. It has not yet become urgent, but it might get done in the next 6-12 months, maybe. Like I say, it is "easy" because all of the rest of the infrastructure for ProxyNodes is now in place.