Executing non Grounded Schema based on ExecutionLink records

ngeiswei commented 6 years ago

We likely need (as part of the effort of porting MOSES to the Atomspace) a way to use declarative knowledge about a Schema to actually run it.

For instance given a mapping between inputs and outputs using http://wiki.opencog.org/w/ExecutionLink defining some schema f such as

(ExecuteLink
  (Schema "f")
  <input-1>
  <output-1>)
...
(ExecuteLink
  (Schema "f")
  <input-n>
  <output-n>)

Executing (via the Atomese interpreter https://github.com/opencog/atomspace/blob/master/opencog/atoms/execution/Instantiator.h#L141)

(ExecutionOutputLink
  (Schema "f")
  <input-i>)

should return the corresponding output

<output-i>

Some care would be required for dealing with undefined or duplicated values (though I guess it'd be OK to just raise an exception when things are ill or un-defined for starter, or leaving the body unchanged).

Some care would also be required to not slow down the interpreter every time it encounters a schema, as we probably don't want it to query the entire atomspace for ExecutionLink when that happens, or do we? Maybe one could restrict such behavior to DefinedSchemaNode. Or perhaps ExecutionLink could have a factory that stores all inputs/outpus as values of the considered schema. Or perhaps we could introduce a SchematizationLink that could do just that (i.e. build from a partial function from a record of ExecutionLink

(DefineLink
  (DefinedSchema "schematization-of-f")
  (SchematizationLink (Schema "f")))

Then one would be able to run (DefinedSchema "schematized-of-f") but not (Schema "f").

ngeiswei commented 6 years ago

SchematizationLink could be understood as something more sophisticated like turning data into a regressed model (like MOSES would), so maybe to distinguish it from that it could be named TablelizationLink, or likely some better name.

linas commented 6 years ago

I'm not clear what you are describing here. So let me make 3-4 random remarks. First, there currently exist the following (under-utilized) links:

http://wiki.opencog.org/w/SignatureLink http://wiki.opencog.org/w/ArrowLink

The ArrowLink is meant to describe the inputs and outputs of a function. The SignatureLink is meant to define a signature in general (not just a function signature). It's suitably polymorphic, I think. These links are inspired by, are intended to capture the essence of arrows, and of signatures, as described in books on term rewriting (e.g. bader & nipkow) or model theory (wilifred hodges) or proof theory or logic in general. Wikipedia describes them.

There's some code and some unit tests for them, but not a lot.

When I first read what you wrote, I thought that maybe the SchematizationLink is one, or the other or some combination of these two links. On closer reading, I see that its not... see next note...

linas commented 6 years ago

On second reading, I see this: "probably don't want to query the entire atomspace for ExecutionLink". Well, you don't have to. If you have the SchemaLink, you merely ask for all members of it's incoming set that are of type ExecutionLink, either C++ or scheme:

(cog-incoming-by-type (Schmea "foo") 'ExecutionLink)

and bingo you've got them all.

linas commented 6 years ago

Also, note that the matrix/vector code does "tableization", at least, in the way that works for me. Wheras you have

(ExecuteLink
  (Schema "f")
  <input-1>
  <output-1>)
...
(ExecuteLink
  (Schema "f")
  <input-n>
  <output-n>)

I have

(FooLink
  (BarNode "f")
  <left-1>
  <right-1>)
...
(FooLink
  (BarNode "f")
  <left-n>
  <right-n>)

Much of the code uses the words "row" and "column" for "left" and "right" ... its the same thing. You can also think of each row or each column as a vector, so the matrix is a collection of vectors.

The code is meant to solve the following problems:

1) work very well for extremely sparse matrices e.g. only one-in-a-million non-zero entries.

2) map any kind of atomspace structure into matrix/vector form. For and Bar can be anything, and left, right can be anywhere. e.g.

(FooLink
  (StuffLink  <left-n>)
  (Other (Different (BarNode "f") (PlaceLink  <right-n>))))

There doesn't even need to be a FooLink -- any pattern match to find left, right will work.

3) provide typical row and column marginal sums , statistics, probabilities, entropies, mean-square-lenghts, cosine angles, jacquard distances, etc.

linas commented 6 years ago

Last comment: For your data, if you have

(ExecuteLink
  (Schema "f")
  <input-1>
  <output-1>)
...
(ExecuteLink
  (Schema "f")
  <input-n>
  <output-n>)

and if the <input-k> and <output-k> are time-varying, and if you don't need to pattern-match them, then use Values not Atoms for them. Its more efficient, uses (an order of magnitude) less ram, is (an order of magnitude) faster for modification.

ngeiswei commented 6 years ago

The tableization of the matrix code seems interesting, thanks for the feedback.

It turns out such "schematization" won't be needed soon (in as-moses) so this issue may likely remain pending for the next few months.

opencog / atomspace

Executing non Grounded Schema based on ExecutionLink records #1795