separated and serializable encoders

htm-community / comportex

Hierarchical Temporal Memory in Clojure

153 stars 27 forks source link

separated and serializable encoders #19

Closed floybix closed 9 years ago

floybix commented 9 years ago

Encoders are the only things in Comportex HTM values that are not serializable. We should fix that so that models can be saved or sent over the wire.

Currently they are created using (reify) which makes a closure. But the main problem is how they get their particular data inputs out of the general, amorphous input-value provided to htm-step. That is using pre-transform which applies some arbitrary function to do the extraction.

Proposal:

Shift the task of formatting the data for encoding outside to whatever is creating the input values. An input-value as provided to htm-step would then be a structured value directly providing the data required by each input's encoder. Each input already has a keyword id specified in core/RegionNetwork so these keys could specify the input data for each.

Example. Suppose there are 2 inputs (feeding into one or more regions), called :main-input and :motor. The former is a concatenation of a category and a coordinate encoding. The latter is a single linear number encoding.

The input-value might then look like:

{:motor 42
 :main-input [:red {:coord [5.0 10.0], :radius 2.5}]
}

Each input encoder will be passed just its sub value.

The various encoder types can be made into Records or Types.

cogmission commented 9 years ago

Isn't an Encoder a stateless entity? Why bother serializing it (if you're talking about state)? Seems the only state is configuration variables which should be "packaged" externally so that they are "re-appliable", that way you can just instantiate an encoder at the endpoint and serialize the parameters and just send those - re-applying the parameters locally to the instantiated Encoder?

Notice the above are all questions. I'm testing my logic, not really declaring a "should" :-)

Sent from my iPhone

On Jun 24, 2015, at 2:48 AM, Felix Andrews notifications@github.com wrote:

Encoders are the only things in Comportex HTM values that are not serializable. We should fix that so that models can be saved or sent over the wire.

Currently they are created using (reify) which makes a closure. But the main problem is how they get their particular data inputs out of the general, amorphous input-value provided to htm-step. That is using pre-transform which applies some arbitrary function to do the extraction.

Proposal:

Shift the task of formatting the data for encoding outside to whatever is creating the input values. An input-value as provided to htm-step would then be a structured value directly providing the data required by each input's encoder. Each input already has a keyword id specified in core/RegionNetwork so these keys could specify the input data for each.

Example. Suppose there are 2 inputs (feeding into one or more regions), called :main-input and :motor. The former is a concatenation of a category and a coordinate encoding. The latter is a single linear number encoding.

The input-value might then look like:

{:motor 42 :main-input [:red {:coord [5.0 10.0], :radius 2.5}] } Each input encoder will be passed just its sub value.

The various encoder types can be made into Records or Types.

— Reply to this email directly or view it on GitHub.

floybix commented 9 years ago

Hi David, interesting question.

So the information, as you rightly point out, is more like configuration than mutable state. It defines which encoders are used and how they are combined. Then each encoder will have parameters such as width/onbits/min/max in a linear scalar encoder.

In Clojure, any object, such as an encoder, is just a map with keyword keys, and tagged with its type, so when you serialize it (which is the same as printing it) you get something that looks like configuration data.

E.g. in the above example the serialized data for the inputs (as nodes in an htm network, like regions) might look like

:inputs {
  :motor #LinearEncoder {
    :width 127, :on-bits 21, :min 0, :max 100
  },
  :main-input #ConcatEncoder {
    :encoders [
      #CategoryEncoder {
        :width 100, :values [:red :green :blue]
      }
      #CoordinateEncoder {
        :width 180, :on-bits 20
      }
  }
}

So it kinda is configuration data. I guess an argument for inventing and handling a separate configuration format is that it would be less tied to the current implementation details. But a really nice thing about this is there is no special serialization mechanism at all. You just print out the whole htm object, and then read it in at the other end like any other value.

I don't have any real experience with serialization so i may also be missing something!

Cheers

On Wednesday, June 24, 2015, David Ray notifications@github.com wrote:

Isn't an Encoder a stateless entity? Why bother serializing it (if you're talking about state)? Seems the only state is configuration variables which should be "packaged" externally so that they are "re-appliable", that way you can just instantiate an encoder at the endpoint and serialize the parameters and just send those - re-applying the parameters locally to the instantiated Encoder?

Notice the above are all questions. I'm testing my logic, not really declaring a "should" :-)

Sent from my iPhone

On Jun 24, 2015, at 2:48 AM, Felix Andrews notifications@github.com wrote:

Encoders are the only things in Comportex HTM values that are not serializable. We should fix that so that models can be saved or sent over the wire.

Currently they are created using (reify) which makes a closure. But the main problem is how they get their particular data inputs out of the general, amorphous input-value provided to htm-step. That is using pre-transform which applies some arbitrary function to do the extraction.

Proposal:

Shift the task of formatting the data for encoding outside to whatever is creating the input values. An input-value as provided to htm-step would then be a structured value directly providing the data required by each input's encoder. Each input already has a keyword id specified in core/RegionNetwork so these keys could specify the input data for each.

Example. Suppose there are 2 inputs (feeding into one or more regions), called :main-input and :motor. The former is a concatenation of a category and a coordinate encoding. The latter is a single linear number encoding.

The input-value might then look like:

{:motor 42 :main-input [:red {:coord [5.0 10.0], :radius 2.5}] } Each input encoder will be passed just its sub value.

The various encoder types can be made into Records or Types.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub https://github.com/nupic-community/comportex/issues/19#issuecomment-114862015 .

floybix commented 9 years ago

Encoders should probably be decomplected from the core functioning of a HTM network.

Let's call a network node representing encoded input a Sensor. It doesn't need to know about an encoder, only its resulting bit-set. Sensor nodes in a network could simply be containers holding bit-sets. We could define p/htm-activate-raw taking bit-sets to assign to each Sensor node, rather than a domain-related input value.

However, we would want to record the original input value, as it is used:

in visualizing the input data for a model (world-pane in ComportexViz)
in labelling or grouping states (state transition diagram in ComportexViz, or assessing classification or temporal pooling). Note this could be extra data beyond the values passed to encoders.

So a network (SensingNetwork?) would store (a) a map of encoders, corresponding to the Sensor nodes and (b) its input value. A function htm-activate would take a domain input value, use the encoders to generate bit-sets, and pass them to htm-activate-raw. And record the input value.

Does that sound reasonable?

floybix commented 9 years ago

On second thought, the term "Sensor" wrongly suggests that it does its own encoding. A better name is "Sense".

Another question that arises is: should Senses define in themselves whether they provide proximal or distal inputs (as currently), or alternatively should Regions declare each of their inputs to be either proximal or distal (and Sense outputs be available for either purpose)? In other words, should the Senses or the Regions decide?

This would matter if Senses were used by some regions as proximal input and others as distal input. Or if a region needed to receive the same Sense as both proximal and distal input (as was suggested for Layer 4 at one point). However a work-around would be to make duplicate Sense nodes for proximal vs distal. I think I would prefer this to complicating the network connection definitions. So, staying with the current approach for now.

mrcslws commented 9 years ago

Just so I'm clear... is the SensingNetwork separate from the RegionNetwork? Would it be something like

(let [[htm sensors] (htm-activate htm sensors input)])

or are the inputs being recorded in an atom somewhere? I suspect I'm not interpreting this correctly.

I like "Sense". That name resonates with me. And removing encoders from the RegionNetwork definitely sounds like a good idea since we'll probably never stop creating encoders via higher-order functions, e.g. with pre-transform, so they'll never be serializable.

floybix commented 9 years ago

Interesting that you say "we'll probably never stop creating encoders via higher-order functions"... as I was just about to have a go at doing that (as in original description of this issue). Do you see a problem?

My current idea is not to have a SensingNetwork, rather encoders would be stored in a RegionNetwork but treated separately. Like this:

(defprotocol PHTM
  (sense [this in-value])
  (htm-activate-raw [this in-bits])
  (htm-learn [this])
  (htm-depolarise [this])
  ...)

(defn htm-activate
  [this in-value]
  (-> this
      (assoc :input-value in-value)
      (htm-activate-raw (sense this in-value))))

A RegionNetwork would gain keys :encoders and :senses instead of :inputs

mrcslws commented 9 years ago

Good call, I should have scrolled up. Ok, so no more passing arbitrary values as inputs. Only those that our defrecord encoders can handle. I can live with that.

floybix commented 9 years ago

I started implementing the proposal above in the demos but it felt wrong. It makes the representation of the world be twisted into a shape required by particular encoders.

For example, in demo coordinates-2d the world is represented like

{:x -10
 :y -20
 :vx 1
 :vy 1
 :ax 1
 :ay 1}

i.e. position, velocity, acceleration. And the world is updated each time step by a function that operates on this in a natural way. But with the new proposal we would need a transformation step before it can be used as an input value:

(fn [world-value]
  (let [{:keys [x y]} world-value]
    {:input {:coord [x y]
             :radii [radius radius]}
     :world world-value}))

This feels awkward since

the :input-value stored in the model, used for viz or analysis, is then not the original world value, even if maybe it includes the original value somehow (here a magical key :world, yuck);
that transformation step needs to know about the names (here :input) and types of encoders. So if you wanted to switch from one encoder to another you would have to change not only the encoder passed to region-network but also the shape of the input value.

It seems better to have encoders encapsulate how to extract their data from the world, as in pre-transform... So we are left with the serialization problem.

It turns out that almost all uses of pre-transform so far are just selecting a value under a particular key. Obviously keywords are serializable so such selector use cases are easy. Even in the coordinates-2d example above, the world value could just as well be represented with {:coord [-10 20], ...} such that a selector :coord could be used; the radius doesn't actually depend on the input value so could instead be a parameter of the CoordinateEncoder. If we wanted to switch from a CoordinateEncoder to a pair of LinearEncoders, they could select paths [:coord 0] and [:coord 1] without changing the input value - good.

There may be more complex transformations or selections required in other cases. For example NuPIC's GeospatialEncoder has a radius that varies with input, so it would need to operate on a value like {:coord [x y], :radii [rx ry]}.

Or imagine a world represented by objects in a space, but encoders need to pull out a visual perspective on those objects from a particular point in space. A complex transformation. In this case it seems reasonable to first transform the world into a sensed / sensible value.

So maybe a good compromise is to attach selectors to encoders, which gives some degree of independence between encoders and the shape of the input value, while relying on the user to handle any more complex transformations required to actually compute the data.

mrcslws commented 9 years ago

Would the model hold on to the selector and the encoder separately? Or would we use something like a higher-level encoder to get-in the value?

I'm thinking we should keep them separate. Otherwise using p/decode within the model would become weird.

So it'd be in the region-network params, either a sense->path or maybe making sensory-encoders use a [sense [path encoder]] format.

floybix commented 9 years ago

Ah, you mean it would be weird because what you pass to p/encode would be a different shape to what you get out of p/decode. Yes I see.

I kind of like having an encoder as a self-contained object though. And you couldn't in general switch an encoder without also changing its selector.

What if the PEncoder protocol had a method extract which returns the encodable value from the world value, while its encode method is passed just that extracted value? That would also leave the door open to other more complex extraction functions beyond get-in, like a juxt-alike for example...

I don't know, maybe I am complecting.

floybix commented 9 years ago

Names help. If I think of a sensor as a [selector encoder] then I'm willing to have them defined in user code and passed as arguments.

Whether the selector should satisfy a protocol or just be a get-in path, I'm not sure.

floybix commented 9 years ago

Actually this separation makes it awkward to combine sensors. Say we have {:position 42, :colour "red"} and we want to concatenate an encoding of the number from :position and the category from :colour.

Individually they would be

(def posn-sensor [(e/select :position) (e/linear-encoder 100 20 [0 99])])
(def colr-sensor [(e/select :colour) (e/unique-encoder [100] 20)])

But to concatenate them?

(def both-sensor
  [(e/juxt (e/select :position) (e/select :colour))
   (e/encat
     (e/linear-encoder 100 20 [0 99])
     (e/unique-encoder [100] 20))])

Whereas if the selector was encapsulated in an encoder one could write

(def both-encoder
  (e/encat
     (e/select :position
        (e/linear-encoder 100 20 [0 99]))
     (e/select :colour
        (e/unique-encoder [100] 20))))

That said, I'm not sure there is any reason to concatenate encoders into one sense, rather than treating them as separate senses.

floybix commented 9 years ago

My last complaint can be disregarded. In practice one would want to concatenate sensors, so we can have a function (e/sensor-cat) which just applies e/juxt and e/encat as above.

You've convinced me of the need for a PExtractor protocol. Thanks!