opencog / asmoses

MOSES Machine Learning: Meta-Optimizing Semantic Evolutionary Search for the AtomSpace (https://github.com/opencog/atomspace)
https://wiki.opencog.org/w/Meta-Optimizing_Semantic_Evolutionary_Search
Other
39 stars 31 forks source link

AtomSpace MOSES Port (part I) #3

Closed ngeiswei closed 6 years ago

ngeiswei commented 6 years ago

This is the initial plan to get started with the AS-MOSES port. It only treats the first steps, though does so in details.

Initiate the as-moses repo

AS-MOSES represents a rather major departure from the existing MOSES, I believe it is best to move it to its own repository. This will minimize confusions to the user and increase the awareness that MOSES is transitioning to something different.

Here are the steps involved to seed the as-moses repository with the old MOSES:

  1. [x] Create a doc/as-moses folder under this repo
  2. [x] Move the root README.md under doc/as-moses
  3. [x] Fork the moses repo by rebasing as-moses onto git@github.com:opencog/moses.git and force-push it to git@github.com:opencog/as-moses.git (I'll temporary disable branch protection on the master after step 1 and 2 have been carried).

Replace Combo by Atomese

This is a rather big undertaking and will be done progressively. He start here with 2 tasks

  1. [x] Port fitness evaluation to Atomese
  2. [x] Port Reduct to Atomese For now rely on combo reduct

More tasks like storing the population and meta-population in AtomSpaces will come next.

Port Fitness Evaluation to Atomese

This task can be decomposed into 3 subtasks

  1. [x] Implement Combo to Atomese converter.
  2. [x] Represent problem data in AtomSpace.
  3. [x] Update the Atomese interpreter to handle problem data. Implement dedicated atomese interpreter.
  4. [x] Add Atomese interpreter in fitness evaluation.

Convert Combo to Atomese

To make Atomese program as compact as possible we will use higher level operators, that is working with predicates or concepts as opposed to individuals, let me give some examples.

Let's assume the following data to fit

+--+--+--+--+
|  |i1|i2|o |
+--+--+--+--+
|r1|0 |1 |1 |
+--+--+--+--+
|r2|1 |0 |1 |
+--+--+--+--+
|r3|0 |0 |0 |
+--+--+--+--+

with 2 input features i1, i2, an output feature o, and 3 observations r1 to r3.

In the boolean domain a possible Combo candidate would be:

(or $i1 $i2)

The corresponding Atomese candidate will look like

(Or (Predicate "i1") (Predicate "i2"))

you may notice that the variables have been replaced by predicates. That is because Or is not operating on individuals, but rather on predicates, so here Or actually represent the union of the satisfying sets of i1 and i2. Doing that allows us not to generates more compact candidates.

Let's assume that the domain was actually real numbers, not Boolean. Then a possible Combo candidate would be:

(+ $i1 $i2)

Likewise in Atomese this will be translated into

(Plus (Schema "i1") (Schema "i2"))

Notice that i1 and i2 are schema, not predicates. That is because since the domain is Real, not Boolean, they can no longer be predicates. Likewise Plus can be overloaded for schemata, similarly to how one can add 2 mathematical functions where h = f + g means ForAll x, h(x) = f(x) + g(x)

Let us give another example

(+ $i1 $i2 3)

Its Atomese translation could be

(Plus (Schema "i1") (Schema "i2") 3)

but what is 3? In principle 3 should be a constant function that returns the number 3 for each input. However I suppose it would be acceptable to overload Plus so that it can mix schemata with constants and assume constants are in fact constant functions. So that

(Plus (Schema "i1") (Schema "i2") (Number 3))

would be understood as

(Plus (Schema "i1") (Schema "i2") (Lambda X (Number 3)))

We will see how it goes but I think it's doable.

Note that there is an existing Combo to Atomese converter here

https://github.com/opencog/moses/blob/master/moses/comboreduct/main/combo-fmt-converter.cc

but it is extremely limited and outputs strings, while what we want is something that builds atoms directly.

The code can probably be added in a converter folder of

https://github.com/opencog/moses/tree/master/moses/comboreduct

it should not require an atomspace (it should be up to the user to add it to the atomspace of his/her choice). So rather than using functions such as AtomSpace::add_link it should use createLink, etc.

Represent Problem Data in AtomSpace

In order to use the Atomese interpreter to measure the fitness of a program over some data, the data need to be loaded to some AtomSpace.

Let us reconsider the example data

+--+--+--+--+
|  |i1|i2|o |
+--+--+--+--+
|r1|0 |1 |1 |
+--+--+--+--+
|r2|1 |0 |1 |
+--+--+--+--+
|r3|0 |0 |0 |
+--+--+--+--+

obtained from the CSV file

i1,i2,o
0,1,1
1,0,1
0,0,0

over the Boolean domain for now.

We need to tell how instances/observations relate to features. Here is how it could be done. For instance r1 could be represented as

(Evaluation (stv 0 1)
  (Predicate "i1")
  (Node "r1"))
(Evaluation (stv 1 1)
  (Predicate "i2")
  (Node "r1"))
(Evaluation (stv 1 1)
  (Predicate "o")
  (Node "r1"))

Assuming the domain of data is Real instead of Boolean, then Execution must be used instead of Evaluation. For instance r1 would be represented as

(Execution
  (Schema "i1")
  (Node "r1")
  (Number 0))
(Execution
  (Schema "i2")
  (Node "r1")
  (Number 1))
(Execution
  (Schema "o")
  (Node "r1")
  (Number 1))

This can be made more compact by considering that a function, say f, is a set of pairs (x, f(x)). For instance i1 can be described with

(Similarity (stv 1 1)
  (Schema "i1")
  (Set
    (List (Node "r1") (Number 0))
    (List (Node "r2") (Number 1))
    (List (Node "r3") (Number 0))))

Or even more compact considering the Cartesian product over features, using ListLink, to represent a Cartesian product between functions over the same domain.

(Similarity (stv 1 1)
  (List (Schema "o") (Schema "i1") (Schema "i2"))
  (Set
    (List (Node "r1") (List (Number 1) (Number 0) (Number 1)))
    (List (Node "r2") (List (Number 1) (Number 1) (Number 0)))
    (List (Node "r3") (List (Number 0) (Number 0) (Number 0)))))

I recommend to use the last representation as it is more compact and I suspect might actually be easier for the interpreter to process.

Update the Atomese Interpreter to Handle Problem Data

Assuming the AtomSpace is loaded with the table above, how to evaluate

(Plus (Schema "i1") (Schema "i2"))

?

Such schema i1+i2 is expected to be represented as

(Execution
  (Plus (Schema "i1") (Schema "i2"))
  (Node "r1")
  (Number 1))
(Execution
  (Plus (Schema "i1") (Schema "i2"))
  (Node "r2")
  (Number 1))
(Execution
  (Plus (Schema "i1") (Schema "i2"))
  (Node "r3")
  (Number 0))

Or equivalently, seeing a function as a set of input/output pairs

(Similarity (stv 1 1)
  (Plus (Schema "i1") (Schema "i2"))
  (Set
    (List (Node "r1") (Number 1))
    (List (Node "r2") (Number 1))
    (List (Node "r3") (Number 0))

However when invoking the cog-execute! on

(Plus (Schema "i1") (Schema "i2"))

the result could simply be

(Set
  (List (Node "r1") (Number 1))
  (List (Node "r2") (Number 1))
  (List (Node "r3") (Number 0)))

The Atomese interpreter would recognize that Plus is applied to schemata, gather their associated data, iteratively apply the lower level operator Plus to the associated numbers, and reconstruct the result as a mapping from inputs (observations r1 to r3) to outputs ((Number 1), etc).

Replace Combo by Atomese Interpreter in Fitness Evaluation

Due to Atomese Reduction not supported, we can only support Atomese right after reduction. Basically, intances will be turned into reduced combo trees, then turned into Atomese program, then fitness evaluated. Let's recall how the fitness function is being called, in the hill-climbing algorithm, the call occurs here

https://github.com/opencog/moses/blob/master/moses/moses/optimization/hill-climbing.cc#L234

which is an iscorer (for instance scorer), so we want to implement a new class inheriting iscorer_base

https://github.com/opencog/moses/blob/master/moses/moses/representation/instance_scorer.h#L35

that turns the instance into an Atomese program and evaluate its fitness. Then upgrade fitness functions to support Atomese programs.

To break it down, the subtasks are

  1. [x] Rename complexity_based_scorer to combo_based_scorer
  2. [x] Implement atomese_based_scorer similar to combo_based_scorer, that turns the instance into a reduced combo_tree, convert it to atomese, then call the fitness on this atomese program.
  3. [x] Overload bscore_base::operator() (with a default dummy implementation to allow it to be optional for now). Since instance_scorer.h will start growing, it would be best to create a instance_scorer.cc and move the implementations there.
  4. [x] Start implementing it for various fitness functions. I recommand to start with logical_bscore which is probably the simplest fitness function type.
  5. [x] ~~Test MOSES using atomese_based_scorer instead combo_based_scorer for the implemented fitness function types.~~ Will be done in the next iteration.

Port Reduct to Atomese

The approach suggested at the moment is to explicitely represent the result of a reduction as a relationship between Atomese program and normal form. I think it makes sense to introduce a link type just for that called ReductLink For instance

(ReductLink 
  (And
    (Predicate "f1")
    (Predicate "f1"))
  (Predicate "f1")))

would represent that f1 and f1 reduces to f1.

To acheive that, ReductLink as well as the operators involved in the Atomese programs must be axiomatized. Then the URE alone should be able to perform the reduction.

This work is already under way, so far by Yidne.

If its completion takes too long, once we have a Combo to Atomese converter and vise versa, we could probably just wrap the existing Combo reduct engine into a URE rule, and use this as a temporary replacement for Atomese reduction just so that we can keep going with the remaining of the port.

linas commented 6 years ago

Instead of Similarity (stv 1 1) Don't you want to use Equivalence ? I know that in PLN, these are the "same thing". However, equivalence suggests a definition, while similarity suggests a possibility. I'd like to keep formally defined, externally imposed definitions distinct from statistically-discovered similarities.

linas commented 6 years ago

This thing: (Span (Schema "i1") (Schema "i2") (Schema "o")) is not a category-theoretic span in any way.

ngeiswei commented 6 years ago

According to the PLN book Similarity is for concepts, while Equivalence is for predicates. Since these functions are seen as sets, which are concepts, Similarity should be the right link type.

linas commented 6 years ago

OK, I guess we can use Similarity for concepts. However, I want to abolish SetLink in favor of MemberLink ...

linas commented 6 years ago

So:

(Member (stv 1 1)
  (List (Node "r1") (Number 0))
  (Schema "i1"))

(Member (stv 1 1)
  (List (Node "r2") (Number 1))
  (Schema "i1"))

(Member (stv 1 1)
  (List (Node "r3") (Number 0))
  (Schema "i1"))
ngeiswei commented 6 years ago

Regarding SpanLink I'd love to have the right term. I'm not category theory expert and I couldn't find a perfect term. In my Category Theory bible, Category Theory for the Sciences, they do mention some form of Cartesian product over the same domain, but they don't give it a name! They just say, page 44, Section 3.1.1.9 " We say this function is induced by f and g, and we denote it <f,g>:A->X x Y, where <f,g>(a)=(f(a),g(a)) " So what I am supposed to call that? InducedLink?

Then I saw further down the book the introduction of span, which I know is no is not exactly that, but is the closest I could find. Other suggestions are welcome.

ngeiswei commented 6 years ago

I don't think we want to abolish SetLink, in some places it really makes things more compact, and unlike sets built with MemberLink, it is immutable.

However what we do need is a construct to recursively step into a set, like Cons for List, perhaps should it be called ConstSet or something...

ngeiswei commented 6 years ago

Regarding span, just to be absolutely clear, the reason I'm not using List (which could be a good candidate to represent Cartesian product), or perhaps more clearly CartesianProd is because it is not exactly what I want. CartesianProd would duplicate the inputs, and in order to minimize redundancy I don't want that.

To reuse the same example, using CartesianProd instead of Span would give

(Similarity (stv 1 1)
  (CartesianProd (Schema "i1") (Schema "i2") (Schema "o"))
  (Set
    (List (List (Node "r1") (Node "r1") (Node "r1")) (List (Number 0) (Number 1) (Number 1)))
    (List (List (Node "r1") (Node "r1") (Node "r2")) (List (Number 0) (Number 0) (Number 1)))
    ...

You see the duplicated inputs, (List (Node "r1") (Node "r1")... That "span" link, or "Induced" link, or whatever-is-the-right-term link, is just to avoid that.

ngeiswei commented 6 years ago

Actually CartesianProd is a lot worse than that, because it would consider all combinations of inputs (I've updated the message above to reflect that). So we really need another link type, it just needs a name.

linas commented 6 years ago

There's no such thing as a Cartesian product in category theory. The Cartesian product is a completely unrelated concept from some other completely different area of mathematics, got nothing to do with anything.

That said, the Cartesian product is an example of a https://en.wikipedia.org/wiki/Pullback_(category_theory) - its an example of a pullback, taken over a singleton (a single point). Pullbacks are generically a way of gluing two other objects together, over some common object to which they are both related. That is, if X and Y are two objects, having Z in common between them, then there exists a unique, universal object P which is the gluing-together of X and Y over Z.

The Cartesian product is only defined over the category of sets (i.e. over set theory). The only thing that two generic sets have in common is that they are made out of points (singletons, sets containing only one element). Thus, when gluing together sets, one takes Z == a singleton == some arbitrary (any) set containing a single element. The pullback is built by mapping each and every element of X to the singleton. Also map each and every element of Y to the singleton. Think of each of these as being like peacocks, with a fan of arrows. Glue these two fan-shaped things together at the singleton (two peacocks having homo sex!?). The result looks like a fan of arrows all pointing into Z, and that result can be uniquely, universally represented by pairs of objects, one chosen from X and one from Y. But the set of all possible pairs is conventionally called the "Cartesian product".

Not all categories have Cartesian products. Set theory is an example of a category that does have a Cartesian product. Not all categories have pullbacks. If they exist, they have to be proven to exist.

Most of what we do in opencog is much closer to model theory than to category theory, and so I'm kind-of sorry I ever talked about category theory w.r.t. opencog.

linas commented 6 years ago

Part 2:

" We say this function is induced by f and g, and we denote it <f,g>:A->X x Y, where <f,g>(a)=(f(a),g(a)) "

That is an example of a Cartesian product. For example: https://en.wikipedia.org/wiki/Product_(category_theory) where the product can be thought of as the "trivial" pullback. In the category of sets, if you have a function f:A->X and a function g:A->Y, then you can prove that there exists a unique, universal function <f,g>:A->X x Y that maps an element a to the pair (f(a), g(a)).

For set theory, this is kind of trivial and goofy and "self-evident" (to some typical college freshman). The issue being tackled here is that there are categories for which products do not exist. When products do exist, then there are certain generic statements and proofs that can be made, that would apply to all categories that have products. This avoids the trouble of having to write the same proof, over and over, once for each different kind of category. (viz, once for set theory, once for group theory, once for topology, etc.) Its a short-cut.

For moses, pretty much everything is naive set theory (well, that, and relational algebra). The notion of a table of rows exists only in set theory: each row is a cartesian product, a "tuple", and the table itself is a set. In set theory, a map from a "row" or "tuple" to some value is just a plain-old ordinary function. In set theory, morphisms are just plain-old functions.

So what I am supposed to call that? InducedLink?

Does it need a name? Call it RowLink or SequenceLink or OrderedSetLink or TupleLink or or VectorLink or ArrayLink or just .. ListLink.

The biggest problem in opencog is that ListLinks are not computer-science lists; they aren't editable (the way a singly-linked list is). They should have been called VectorLinks or TupleLinks or ArrayLinks, to emphasize their immutability.

linas commented 6 years ago

SetLink makes things more compact, and unlike sets built with MemberLink, it is immutable.

During learning, while moses is running, one might want to alter the set of inputs. This becomes hard and klunky with set links.

linas commented 6 years ago

You see the duplicated inputs, (List (Node "r1") (Node "r1")), ... That "span" link, or "Induced" link, or whatever-is-the-right-term link, is just to avoid that.

I don't understand why there is some need "duplicate the inputs". At any rate, Cartesian products and functions have nothing to do with one-another; they are unrelated concepts.

The tables in moses are just tables. You do NOT have to specify which column is the dependent column in advance. Just build the table; then, later on, you can indicate that column p42 is the one that moses should train on. This way, you can just throw some generic tables into the atomspace, and decide, at a later point in time, which columns should be trained on. Maybe a table has two or three dependents, and you want to run two or three instances of moses, training on each.

linas commented 6 years ago

Re tables: I have a comment, kind-of off to the side, but -- in language learning, I have tables that are extremely sparse, with only one out of a million or one out of a billion entries that are non-zero. Representing these as ordinary tables would blow out RAM. (They already blow out RAM, even when they are sparse...)

linas commented 6 years ago

I'm working on an alternate way of representing tabular data right now, but decided to distract myself and read email today. Upshot, I think there's a simpler way of representing data than what's given above.

ngeiswei commented 6 years ago

@linas totally open to have alternate ways of representing tabular data, spare, dense, etc. However, keep in mind that this table will then have to be used to evaluate Atomese programs. Wouldn't it be really cool if the semantics corresponding to these tables directly guides how to evaluate Atomese programs?

This is the reason why I didn't use a mere Link to group together columns. One way to see columns is as function descriptions, mapping inputs to outputs. Likewise one way to see tuples of columns (table) could be as groups of functions (mapping inputs to outputs for each function), etc. Doing that means we don't need to hack the Atomese interpreter (aka pattern matcher) specifically to handle MOSES tables. If we use arbitrary links, without worrying about their semantics, I think we get into the risk of bloating the Atomese interpreter.

ngeiswei commented 6 years ago

For example: https://en.wikipedia.org/wiki/Product_(category_theory) where the product can be thought of as the "trivial" pullback.

Well, that is exactly what I was looking for. So it's just a mere product (with a codomain that is defined as the Cartesian product of the codomains of the features, that is what confused me). So the link type name I'm after is really a ProductLink. So LinkLink might be just fine, if we agree to give it the semantics of a product.

linas commented 6 years ago

LinkLink

??? ListLink is already a link that implements a Cartesian product. In fact, every link type that derived from OrderedLink is a Cartesian product. Why in the heck do we need yet another Cartesian product link? What's wrong with just OrderedLink?

ngeiswei commented 6 years ago

I meant ListLink.

In fact, every link type that derived from OrderedLink is a Cartesian product.

That feels true.

Yidnekachew commented 6 years ago

I'm working on an alternate way of representing tabular data right now, but decided to distract myself and read email today.

@linas Can I have a look what that alternative way looks like & when are we going to get it?