Some equivalent Atomese program representations for manipulation and evaluation

ngeiswei commented 6 years ago

Overview

This issue contains some considerations regarding various ways Atomese programs could be represented, manipulated and evaluated.

Warning: it is not a plan for immediate actions, just some considerations.

Motivation

As suggested by @Bitseat in order to avoid hacking too much the atomese interpreter https://github.com/opencog/atomspace/blob/master/opencog/atoms/execution/Instantiator.h#L141 an option would be to unfold an Atomese program to be readily interpretable by Instantiator::execute.

For instance given the data set represented as

(Similarity (stv 1 1)
  (List (Schema "o") (Schema "i1") (Schema "i2"))
  (Set
    (List (Node "r1") (List (Number 1) (Number 0) (Number 1)))
    (List (Node "r2") (List (Number 1) (Number 1) (Number 0)))
    (List (Node "r3") (List (Number 0) (Number 0) (Number 0)))))

and the combo program

(Plus (Schema "i1") (Schema "i2"))

It could be unfolded into

(Set
  (List (Node "r1") (Plus (Number 0) (Number 1)))
  (List (Node "r2") (Plus (Number 1) (Number 0)))
  (List (Node "r3") (Plus (Number 0) (Number 0))))

which passed to the Atomese interpreter would return the desired result

(Set
  (List (Node "r1") (Number 1))
  (List (Node "r2") (Number 1))
  (List (Node "r3") (Number 0)))

However I'm thinking we can probably take a middle ground approach where the unfolding would be much lighter and wouldn't involve hacking the interpreter so that Plus, etc would support higher level inputs (which ultimately is probably fine and desired, but since we are in an exploratory stage we want to avoid too much potentially unnecessary and complicated hacking). Also, I suspect that this sort of lightweight unfolding will be beneficial for subsequent Atomese program processing, such as finding patterns in a population of programs and evaluating them on new inputs.

Proposal

So here it goes, for instance given (Plus (Schema "i1") (Schema "i2")), the first level of unfolding could be (using unimplemented FunMapLink)

(FunMap
  (List
    (Variable "$R")
    (Lambda
      (Variable "$R")
      (Plus
        (ExecutionOutput
          (Schema "f1")
          (Variable "$R"))
        (ExecutionOutput
          (Schema "f2")
          (Variable "$R")))))
  (Domain))

where FunMap is to be distinguished from http://wiki.opencog.org/w/MapLink as it doesn't assume that its first argument is a pattern but rather a function, and thus has the same semantics as https://hackage.haskell.org/package/base-4.11.1.0/docs/Prelude.html#v:map or in scheme https://srfi.schemers.org/srfi-1/srfi-1.html#FoldUnfoldMap

And Domain is just something that retrieves the row names, r1 to r3, and should probably be written

(Domain (List (Schema "f1") (Schema "f2")))

but is just written (Domain) here for simplicity.

So written in a more casual functional program style it would be

(map (lambda (r) (cons r (+ (f1 r) (f2 r)))) (domain))

Alternatively, as suggested by @kasimebrahim, one could use PutLink

(Put
  (Variable "$R")
  (List
    (Variable "$R")
    (Put
      (Lambda
        (Variable "$R")
        (Plus
          (ExecutionOutput
            (Schema "f1")
            (Variable "$R"))
          (ExecutionOutput
            (Schema "f2")
            (Variable "$R"))))
      (Variable "$R")))
  (Domain))

The next unfolding, which is probably the most interesting is

(FunMap
  (List
    (Variable "$R")
    (Put
      (Lambda
        (VariableList
          (Variable "$X")
          (Variable "$Y"))
        (Plus
          (Variable "$X")
          (Variable "$Y")))
      (Lambda
        (Variable "$R")
        (List
          (Schema "f1")
          (Schema "f2"))))
  (Domain)))

because it exposes the heart of the program

      (Lambda
        (VariableList
          (Variable "$X")
          (Variable "$Y"))
        (Plus
          (Variable "$X")
          (Variable "$Y")))

then links it to the inputs i1 and i2, via using Put, then applies to the domain r1 to r3. The good thing about this representation is that it allows to abstract away the features (which can be better to reason about some patterns), and it also makes it easier to evaluate it on new inputs, because you only need to change one place (Domain) by say (NewDomain) to express that simply.

kasimebrahim commented 6 years ago


; get the node containing the row name from the row 'R' [(car row)]
(define (rowname R)
    ....)

; this yields a list containing the values of each features
; for a given row 'R' [(cdr row)]
(define (row R)
    ....)

; get all the features as List of Variable Nodes from the problem data
(define (featureVariables problemData)
    ....)

; this is the program over all the features
(DefineLink
    (DefinedSchemaNode "programOne")
    (Lambda
        (VariableList
            (ExecutionOutputLink
                 (GroundedSchemaNode "featureVariables")
                 (Node "ProblemData")))
        (Plus
            (VariableNode "$f1")
            (VariableNode "$f2"))))

; get Domain
(DefineLink
    (DefinedSchemaNode "domain")
    (SetLink
        (ListLink (Node "r1") (ListLink (Number 1) (Number 0)))
        (ListLink (Node "r2") (ListLink (Number 0) (Number 1)))))

; this is the unfolding of the program
(PutLink
    (VariableNode "$R")
    (ListLink
        (ExecutionOutputLink
            (GroundedSchemaNode "rowname")
            (VariableNode "$R"))
        (ExecutionOutputLink
                (DefinedSchemaNode "programOne")
                (ExecutionOutputLink
                    (GroundedSchemaNode "row")
                    (VariableNode "$R"))))
    (DefinedSchemaNode "domain"))

This could be an alternative for the second unfolding, but as you can see it has some problems already. For instance in order to have a generic program that works on a generic domain I needed to have a procedure to declare the variable names "featureVariables" and that wont work because VariableList can't be created from List containing VariableNodes and it may not be a good way to do this in general hopefully you will have better ideas.

ngeiswei commented 6 years ago

That's a possible alternative. A few comments

You should probably rename domain into intput-table or something. What I meant by domain was the input class of the functions corresponding to features f1, etc, including the target feature o. That is the domain of the feature functions is the set of rows, and the codomains are type of values the features hold (usually Boolean or Number).

I think (VariableList (ExecutionOutputLink ... should be ill-formed. If one wants to auto-generate variable lists, one may instead use Put and Quote, like

(Put
(Quote
(Lambda
  (Unquote
    (Variable "$vardecl"))
  (Unquote
    (Plus (Variable "$X1") (Variable "$X2")))))
(ExecutionOutput
(GroundedSchema "featureVariables")
(Node "ProblemData")))

Having the program takes only the variables it needs, and clearly separate those variables from the input features is a slightly higher level abstraction, but I don't know if it would really be beneficial, I mean the pattern matcher for instance would be able to catch common patterns about the programs in both representations. It's just a consideration.

opencog / asmoses