htm-community / comportex

Hierarchical Temporal Memory in Clojure
153 stars 27 forks source link

Rethink temporal pooling - discrete transitions at higher levels. #32

Closed floybix closed 9 years ago

floybix commented 9 years ago

This rewrite comes from taking seriously the need for sequence learning at higher levels. At higher levels we have temporal slowness: cells stay active for longer, i.e. several time steps. If there is a uniform cortical algorithm then we need the usual sequence learning to work under temporal slowness. I think that means discrete transitions.

I define a threshold fraction of stable inputs to a layer for it to be engaged. Only when a layer is newly engaged does it replace its active columns based on the stable input. At other times, these active columns and cells continue to stay active. The exception is any columns which have a definite (high) match to input bits, and the relative influence of these over continuing columns is controlled by a parameter (temporal-pooling-max-exc).

Active columns learn on proximal dendrites as long as the layer is engaged, meaning it has continuing stable input.

For learning on distal dendrites, define:

And each time step, the learning cells can grow synapses to the learnable cells. This applies in all layers, not just higher temporal pooling layers. Notably it has a big effect on gradual continuous sequence learning, such as with the coordinate encoder. That will probably cause problems because there might not be enough coincidences of some cells starting while others are stopping. Maybe the learnable cells should remain learnable for a few time steps.

I'm not sure the old way was much better because it would end up with a lot of cells connecting to the other ones representing the same coordinate, which is not useful sequence information.

Obviously, this is all experimental. I haven't really experimented with it to see how the temporal pooling properties hold up. But what we had didn't work anyway, so might as well replace it.

floybix commented 9 years ago

With this change we grow distal synapses from source A to target B if their activation lines up in series, like

AAAA
    BBBB

Maybe we should also grow when they overlap but are clearly ordered, like

AAAA
  BBBB

and maybe

AAAA
  BB

This could be implemented by defining learning cells to be newly active winners (as with this change), but allowing source learnable cells to be all winners, not just the ones turning off.

However that would allow this connection to be learned which is questionable:

AAAA
 BB
floybix commented 9 years ago

OK I think we need to handle continuous sequences, so going with (what I just wrote,) allowing distal learning between cells overlapping in time, by making all winner cells learnable (but winners are still only learning when they first become active).

Just looking at coordinate encoder demo, there is a lot of non-sequential learning going on because, while a column stays active, the winner cell in a column often switches under the influence of distal (predictive) excitation. Since these are "new" winner cells they do distal learning. Ends up with a lot of noise.

An obvious solution would be to force the winner cell in a column to be fixed until the column turns off. However I don't want to do that because, the initial context of a column might be wrong, and e.g. top-down feedback should be able to resolve the context to the correct cell in a column even while the column stays active.

I think instead we could allow the winner cell in a column to switch according to total excitation (as now), but if it is in a continuing active column it should not be learning.

cogmission commented 9 years ago

Hi Felix,

I don't mean to chime in where I'm not invited - but it seems you are doing some very critical thinking and experimentation. Your implementations are far ahead of the curve, and I don't understand why you choose not to do this kind of thinking in the general theoretical forum? It doesn't seem to be only a "comported" oriented subject? I may be wrong, but it seems like you would get the benefit of a wider knowledgable audience, and the general public would get the benefit of your excellent work?

Just a thought...

Cheers, David

On Tue, Sep 29, 2015 at 9:22 AM, Felix Andrews notifications@github.com wrote:

OK I think we need to handle continuous sequences, so going with (what I just wrote,) allowing distal learning between cells overlapping in time, by making all winner cells learnable (but winners are still only learning when they first become active).

Just looking at coordinate encoder demo, there is a lot of non-sequential learning going on because, while a column stays active, the winner cell in a column often switches under the influence of distal (predictive) excitation. Since these are "new" winner cells they do distal learning. Ends up with a lot of noise.

An obvious solution would be to force the winner cell in a column to be fixed until the column turns off. However I don't want to do that because, the initial context of a column might be wrong, and e.g. top-down feedback should be able to resolve the context to the correct cell in a column even while the column stays active.

I think instead we could allow the winner cell in a column to switch according to total excitation (as now), but if it is in a continuing active column it should not be learning.

— Reply to this email directly or view it on GitHub https://github.com/nupic-community/comportex/pull/32#issuecomment-144075606 .

With kind regards,

David Ray Java Solutions Architect

Cortical.io http://cortical.io/ Sponsor of: HTM.java https://github.com/numenta/htm.java

d.ray@cortical.io http://cortical.io

floybix commented 9 years ago

Well that didn't help much because the winners kept switching, even if not learning. We really don't want to break the connection between learning and learnable cells. But in combination with a change to keep winner cells stable when all else is equal, we are getting somewhere (this was used in higher levels but applies equally to gradual continuous sequences).

In fact, howdidinotrealisethisbefore. In gradually changing continuous sequences we have cells remaining active over time while we are within its coordinate range, say. That shares a lot of properties with temporal pooling at higher levels. Look at this, in the single-layer coordinates-2d demo: (time goes right, columns sorted for clarity) screen shot 2015-09-29 at 10 40 50 pm

It shows we are now correctly predicting the onset of active cells. The red (bursting) states are just the initially-predicted ones continuing. And maybe we don't want to keep predicting them... just as we don't want to keep predicting the current state in a stable temporal pooling layer.

Hmm. Anyway I'm going to bed now.

mrcslws commented 9 years ago

@cogmission Speaking for myself, I think of nupic-theory as "I want Jeff to read this". These side forums are a staging area for legitimized paragraphs :)

cogmission commented 9 years ago

@Marcus I see. I just didn't know who was reading this, and I don't see any responses (other than yours) which makes me think that this work may not get the benefit of feedback? Seems to me that Numenta is going to eventually focus their attention on stability and the ground that Felix covers may get wasted unless he simply solves everything first? Others should get the benefit of yours and Felix's hard work without having to think through the same repetitive process... Just a thought...

On Tue, Sep 29, 2015 at 11:04 AM, Marcus Lewis notifications@github.com wrote:

@cogmission https://github.com/cogmission Speaking for myself, I think of nupic-theory as "I want Jeff to read this". These side forums are a staging area for legitimized paragraphs :)

— Reply to this email directly or view it on GitHub https://github.com/nupic-community/comportex/pull/32#issuecomment-144104255 .

With kind regards,

David Ray Java Solutions Architect

Cortical.io http://cortical.io/ Sponsor of: HTM.java https://github.com/numenta/htm.java

d.ray@cortical.io http://cortical.io

floybix commented 9 years ago

@cogmission a good point. I will ask for help from the nupic-theory list as you suggest, but I will see if I can consolidate my thoughts a bit first to avoid wasting everyone's time.

floybix commented 9 years ago

I was confusing myself about gradual continuous sequences. The way the learning actually works is really weird. Because there is some level of stability in columns, winner cells tend to have similar distal inputs between time steps, so stay active; they also continue to learn on distal segments, extending them to reflect the slight changes between steps. At some point the context changes past a threshold and a new winner takes over. So you get these self-organising transitions. It is easier to understand in the interactive demo.

I don't think it's perfect, as there is a lot of redundant learning going on, but it does seem to work quite well, at least visually. Sampling rate is an open question. (time goes right, columns sorted for clarity) coordinates demo with all winners learning

So anyway all my comments in this thread can be ignored and replaced with this for distal learning:

floybix commented 9 years ago

There is a problem with this whole approach which is obvious in retrospect. (In fact I now remember that I realised this before, in my first attempt at temporal pooling, but forgot about it.)

Recall that as soon as the first level becomes predictable, the higher (temporal pooling) layer "engages" and fixes its active columns; they then keep growing new dendrites to encompass the following predictable sequence.

The problem is, just because a sequence is recognised as predictable does not mean it is resolved into a unique identity, and of course it cannot in general be resolved uniquely until the whole sequence has been seen. For example seeing the letters "t,h,e" vs "t,h,r,e,e". The sequence is predicted at "h" but not uniquely. If we freeze the pooled representation at that point it will be identical for "the" and "three".

One way to go is Numenta's "Union Pooler" approach - I only have a vague and possibly incorrect understanding: throughout a predicted sequence, more and more cells get added to the temporal pooling representation. Therefore the final representation should have some unique component. The nice part is that the union representation should include bits from all steps of a sequence, so you get semantic overlap with similar sequences. I'm not sure how you get this to be stable enough to model higher level sequences.

Another way might be to use an attention-like mechanism to "engage" the temporal pooling once the predictions have been resolved down to a single path.

cogmission commented 9 years ago

For example seeing the letters "t,h,e" vs "t,h,r,e,e". The sequence is predicted at "h" but not uniquely. If we freeze the pooled representation at that point it will be identical for "the" and "three".

Correct me if I'm wrong, but shouldn't the behavior be either a prediction of "the" or a prediction of "three" until the 3rd letter is reached? Isn't that ok? Also, doesn't scope of prediction also fall into play? What I mean is, aren't there times when we remember the "gist" of something but maybe mistake or confuse one or more details? Should we be thinking of the pooler as a resolver or a generalizer? When I originally read your mail my first thought was that the HTM doesn't "resolve" anything necessarily. It felt more true for me that the HTM merely predicts and later receives reinforcement for that prediction in the form of a successful prediction - if we think about the HTM as "resolving" things, then we get into the arena of meta-oversight. My inclination is to think that oversight isn't happening at all, and that the system merely functions like a Monad that doesn't have contextual awareness, but that awareness is emergent by the prediction mechanism?

Also the choice of "the" or "three" seems like it would be dependent on the previous input (maybe way way back) to provide successful context recognition?

Just some random thoughts...

Cheers, David

On Thu, Oct 1, 2015 at 9:17 AM, Felix Andrews notifications@github.com wrote:

There is a problem with this whole approach which is obvious in retrospect. (In fact I now remember that I realised this before, in my first attempt at temporal pooling, but forgot about it.)

Recall that as soon as the first level becomes predictable, the higher (temporal pooling) layer "engages" and fixes its active columns; they then keep growing new dendrites to encompass the following predictable sequence.

The problem is, just because a sequence is recognised as predictable does not mean it is resolved into a unique identity, and of course it cannot in general be resolved uniquely until the whole sequence has been seen. For example seeing the letters "t,h,e" vs "t,h,r,e,e". The sequence is predicted at "h" but not uniquely. If we freeze the pooled representation at that point it will be identical for "the" and "three".

One way to go is Numenta's "Union Pooler" approach - I only have a vague and possibly incorrect understanding: throughout a predicted sequence, more and more cells get added to the temporal pooling representation. Therefore the final representation should have some unique component. The nice part is that the union representation should include bits from all steps of a sequence, so you get semantic overlap with similar sequences. I'm not sure how you get this to be stable enough to model higher level sequences.

Another way might be to use an attention-like mechanism to "engage" the temporal pooling once the predictions have been resolved down to a single path.

— Reply to this email directly or view it on GitHub https://github.com/nupic-community/comportex/pull/32#issuecomment-144741075 .

With kind regards,

David Ray Java Solutions Architect

Cortical.io http://cortical.io/ Sponsor of: HTM.java https://github.com/numenta/htm.java

d.ray@cortical.io http://cortical.io

floybix commented 9 years ago

So I did a kind of implementation of a union pooler.

floybix commented 9 years ago

A puzzle that comes up when we think about sequence learning at higher levels:

How do we maintain a "bursting" column state in a higher level layer? (If a transition was not predicted, the newly activated columns should burst, activating all their cells / contexts.) I'm assuming that the same mechanism should apply at all levels.

Under temporal pooling, cells in a column may stay active for several time steps; for sequence learning this could be either a single predicted cell, or many bursting cells. I make this work by setting a level of persistent temporal pooling excitation on all newly active cells.

Apart from keeping multiple predictions open, the other role of bursting is in defining feed-forward outputs from the layer as being "stable" or not, which is used in temporal pooling at still higher levels. This seems to suggest we should define bursting simply by whether all cells in a column are (continuing to be) active. However, that definition can't apply in the first level if we have one cell per column. And it seems to lose the essence of "bursting" in being defined by a (lack of) predictive depolarising potential.

In practice we seem to be left with a composite definition of bursting: by predictive potential on newly-active / first-level steps, and all-cells-per-column during temporal pooling phases.

floybix commented 9 years ago

@mrcslws please review. Um sorry about the 13 commits... do you think I should squash them?

mrcslws commented 9 years ago

I'm taking some time to study this change and become opinionated about it. I should have something coherent to say tomorrow (Sunday my time). The 13 commits are fine by me. Feel free to merge without me, I can always comment on commits.

floybix commented 9 years ago

Sure, there's no urgency about it. And thanks.

On Sunday, 4 October 2015, Marcus Lewis notifications@github.com wrote:

I'm taking some time to study this change and become opinionated about it. I should have something coherent to say tomorrow (Sunday my time). The 13 commits are fine by me. Feel free to merge without me, I can always comment on commits.

— Reply to this email directly or view it on GitHub https://github.com/nupic-community/comportex/pull/32#issuecomment-145317235 .

Felix Andrews / 安福立 http://www.neurofractal.org/felix/

mrcslws commented 9 years ago

:+1:

floybix commented 9 years ago

Just merging to carry on with experiments, not because this is finished by any stretch of the imagination.