Implement ConsLink as underlying ListLink constructor

ngeiswei commented 6 years ago

The problem of ListLink (as well as SetLink) is that it allows to construct atoms with an arbitrary large outgoing set. There are 2 problems with that

Dealing with arbitrary large outgoing set is problematic for persistent storage (personally I don't really understand why, but @linas says so).
Representing recursive relationships is cumbersome because for instance the pattern matcher has no way to represent a pattern like (List head ...). To circumvent that pattern matcher query on Lists tend to enumerate lists up to a certain size putting variables or constant as elements. (EDIT: Glob http://wiki.opencog.org/w/GlobNode can probably be used to address that).

As a solution we could introduce ConsLink such that

ConsLink
  <head>
  <tail>

would be equivalent to

ListLink
  <head>
  <elements-of-tail>

There would be 2 options to handle this equivalence

Explicitly (using explicit conversion rules).
Implicitly (hardwired in the C++ code, like alpha-equivalence for scope links).

I think it should be handled automatically/implicitly. So for instance if one queries

(cog-execute!
  (Get
    (Cons
      (Concept "head")
      (Variable "$tail"))))

it could get in return all Lists starting by (Concept "head"), or vice versa.

Whether the internal representation is a ListLink or a ConsLink would be up to the internals of the AtomSpace. For instance the persistent storage could use ConsLink whenever ListLink are too large, etc.

I think would also be good to have the same for Set. Maybe it could be called SCons.

linas commented 6 years ago

Keep in mind that atomese is not lambda calculus, its not at all like lisp or scheme. Its very different. Atomese is a lot more like prolog and SQL -- its a knowledge-representation language first, and not so much a programming language.

arbitrary large outgoing set is problematic for persistent storage

This is not a theoretical limitation, its an "implementation detail". For performance reasons, the outgoing set is stored all-togehter, which means a max size of 330 atoms in the outging set. Someone would have to write code to store over-flow-sized outgoing sets in some other table (with an obvious performance degradation).

the pattern matcher has no way to represent a pattern like (List head ...)

What does that mean? We have GlobNode, it works, it can pick out N elements from a list, conforming to some particular signature. With Glob, I think we care close to being able to do anything a regex can do. Maybe not as effeciently, but then regexes search strings with finite state machines, while the pattern matcher searches graphs with a stack machine. But, on a single "level", before a stack push/pop it does look "flat", and thus regex-like. Thus, ListLink is effectively a "flat string" of atoms; it is not a graph (linked list).

linas commented 6 years ago

(cog-execute!
  (Get
    (Cons
      (Concept "head")
      (Variable "$tail"))))

This works today, except you have to say GlobNode instead of VariableNode. This is used very heavily in ghost, and a little bit in the language-learning code. That is, you'd say

(cog-execute!
  (Get
    (SomeLink
      (Concept "head")
      (Glob "$tail"))))

Globs also work correctly with BindLink, so that

(cog-execute!
  (Bind
    (SomeLink
      (Concept "head")
      (Glob "$tail"))
    (OtherLink
       (Glob "$tail") 
       (Concept "the end")))

works "as expected". Globs can be typed, and the length of a glob can be restricted to lie within a range.

Simple typing for Globs works. Complex typing, like SignatureLink is probably buggy/broken.

linas commented 6 years ago

Re: implicit/explicit equivalence -- this is an interesting and difficult topic, so far mostly unexplored. The wiki page for EquivalenceLink says "these things are equivalent", and I think the PLN book lists more. On the MOSES side, combo+reduct says "these things are equivalent". So the question is, "how and when to we deal with equivalence"?

There are two or three or four problems.

There's a (potentially huge) performance issue: every time you want to access atom X, you might first want to search the space of all things Y that are equivalent to X. That's a lot slower than just grabbing X.
There's no a-priori way to specify all-possible equivalences. There might always be a new one, one that we forgot about, and need to add to the list. This suggests that equivalences should be treated as rewrite-rules, which can be applied, or not.
If equivalences really are rewrite rules, then a) they need to be applied b) worried about "confluence". Confluence means: if I apply rewrite-rule A and then rewrite-rule B, do I get the same thing as doing B and then A? Equivalent things should be confluent, but this is not obvious on the face of things.
The general background for this is "satisfiability modulo theories" (SMT). A "theory" is a collection of rules (aka "axioms") that state when two things are equivalent or not.

Having a clean, well-defined SMT solver in the atomspace would be cool. But I think it should be driven by explicit use-cases. The #1 use case I can think of is porting moses-reduct to the atomspace.

So: way back when, I wrote the code in atoms/reduct by being inspired by moses-reduct. PlusLink, TimesLink all handle variables, and try to handle various reduct-like cases. e.g. (Plus (Variable x) (Number 0)) reduces to just (Variable x). (This acutally works, I think, and is tested in a unit test.) The code is in C++, its a bit messy, convoluted and complex. Shortly after finishing the code, I decided "ah heck, this is stupid, one should not hard-code axioms in C++, one should instead have rules and a rule-engine and apply reduct rules, just like moses does today". So I almost deleted everything in that folder.

Flip side: creating rewrite rules that transform x+0 into x is .. cpu intensive. C++ is indeed cheaper.

Flip-again: moses-reduct applies rules, without having a rule-engine. Instead, moses defines a set of rules, and then hard-codes the order in which they should be applied, sometimes multiple times, at different stages. For us, that would be like having a file that specified 5 or 10 or 20 BindLinks (the rules) and another file that very explicitly applied them in a specific, hard-coded order. We have not ever really explored this. Perhaps as-moses can explore this.

Flip-flip: Ideally, one could automatically learn the best hard-coded order, via chaining+pattern-mining. I think you've been trying to do this, but clearly this is not easy.

flip-flip-flip: a lot of my concern above is about efficiency, and so there's the meta-question: how can we make atomspace+pattern-matcher more cpu-efficient? I suspect that this is a hard question, because I think we've squeezed out all we can, in the naive implementation, and to do better requires a radical redesign. But radical re-designs are risky, hard, dangerous, might never work out, require a huge amount of discussion and thinking and talking and planning. Which makes me shy away, at least right now.

linas commented 6 years ago

Yes, creating a Cons link, under the covers, would be an excellent and simple solution to the SQL limitation of atom-sizes. I really, really like that idea. But that link-type should probably exist only in the SQL backend, and nowhere else, as I fear that it will pollute atomese with yet another extraneous feature. I'm worried about the overall complexity of atomese, I'm worried abut adding things that later on generate headaches and need to be maintained.

opencog / atomspace

Implement ConsLink as underlying ListLink constructor #1763