numenta / nupic-legacy

Numenta Platform for Intelligent Computing is an implementation of Hierarchical Temporal Memory (HTM), a theory of intelligence based strictly on the neuroscience of the neocortex.
http://numenta.org/
GNU Affero General Public License v3.0
6.33k stars 1.55k forks source link

Anomaly needs access to TP/SP instances #1306

Open breznak opened 10 years ago

breznak commented 10 years ago

take #1300 for reference Experimentation within Anomaly class would benefit from having direct access to SP/TP/Encoders and their properties.

This should be done either by explicitly providing link to the SP/TP instances (when calling directly the SP/TP/anomaly methods from your custom code), or implicitly by passing CLAModel instance and getting the values from there (when using Anomaly eg from OPF)

scottpurdy commented 10 years ago

@breznak - If we create a region for the anomaly class then you could create links from the SP or TP regions. Does that provide the functionality you want?

breznak commented 10 years ago

@scottpurdy that could be a way! I don't know here...

What do you think of #1309 ? Maybe both approaches?

scottpurdy commented 10 years ago

@breznak - You will always be able to instantiate those classes and use them without regions.

breznak commented 10 years ago

@scottpurdy

You will always be able to instantiate those classes and use them without regions.

Yes, but I need the TP instance accessible from Anomaly. How about this?

scottpurdy commented 10 years ago

@breznak I don't know the exact values or use case you have but the Anomaly class shouldn't have references to SP/TP/CLAModel. Just set up the Anomaly constructor and compute method to take the values they need and then pass them in.

anomaly.compute() would ask for real array of permanences (not "predicted" columns >0 used now)

That seems reasonable. It also seems like you have some pretty specific use cases for the anomaly class and I worry about over-generalizing or adding too many features or code dependencies. We have to balance supporting individual use cases with keeping the code simple and modular.

So perhaps you can give some more details on what you want so we can figure out the right way to approach it.

breznak commented 10 years ago

@scottpurdy I need to get access to the weights of columns in Anomaly to work on https://github.com/numenta/nupic/issues/1300 where I want to change the way anomalyScore is computed (couputeRawAnomalyScore can stay the same) If I change the meaning of parameters (to these permanences) in compute() that would give me free hands for experimentations (+no extra dependencies). Would that be ok?

scottpurdy commented 9 years ago

@chetan51 - Can you take a look at this? I have been dragging my feet because I don't like exposing unrelated state to the anomaly class but I don't know how to address @breznak use case.

@breznak - Sorry for the delay on this. I kept coming back to it and not knowing what to do. Hopefully we can figure something out soon.

cogmission commented 9 years ago

You may be experiencing difficulty because the "state" doesn't really belong to the SP? Or maybe you can think about it like a separate entity, and pull the state out into another object which would then be sharable by any object? You could then think of the SP/TM or whatever as merely algorithmic processes which are separate from the data and state? This is an idea I have been toying with for the HTM.java project. You could pull it entirely out and contain it in a "connections" object or something then pass it in to multiple algorithms. Seems natural to me...?

On Thu, Nov 20, 2014 at 8:15 PM, Scott Purdy notifications@github.com wrote:

@chetan51 https://github.com/chetan51 - Can you take a look at this? I have been dragging my feet because I don't like exposing unrelated state to the anomaly class but I don't know how to address @breznak https://github.com/breznak use case.

@breznak https://github.com/breznak - Sorry for the delay on this. I kept coming back to it and not knowing what to do. Hopefully we can figure something out soon.

— Reply to this email directly or view it on GitHub https://github.com/numenta/nupic/issues/1306#issuecomment-63914870.

We find it hard to hear what another is saying because of how loudly "who one is", speaks...

scottpurdy commented 9 years ago

No, I think the state in the SP/TM belongs there, but if the anomaly class is a member of the SP/TP and then has a reference to the SP/TP then it creates a circular dependency.

cogmission commented 9 years ago

The problem with that is the SP and the TM both operate on the same column/cell structure, so which one does the structure belong to? To me the answer is (should be) neither because it actually doesn't reside in either conceptually. The "state" belongs to the Column/Cell matrix not the SP and TM therefore? I'm just saying, and not trying to argue - but it seems natural to me...?

scottpurdy commented 9 years ago

Don't want to get too far off topic, @cogmission I will email you separately.

cogmission commented 9 years ago

Ok... no problem...

On Fri, Nov 21, 2014 at 11:41 AM, Scott Purdy notifications@github.com wrote:

Don't want to get too far off topic, @cogmission https://github.com/cogmission I will email you separately.

— Reply to this email directly or view it on GitHub https://github.com/numenta/nupic/issues/1306#issuecomment-64008430.

We find it hard to hear what another is saying because of how loudly "who one is", speaks...

chetan51 commented 9 years ago

@cogmission We are moving towards using the C++ Connections data structure in all algorithms, so in a sense, we will have the data separated from the algorithms. However, the data an algorithm needs will still be a member of the algorithm instance. Furthermore, with this new approach, the SP and TM both won't operate on the same cell/column structure, but will rather be operating on different kinds of connections (proximal and distal respectively). So they will only store the Connections that they need, and not share it with other algorithm instances.

This will make it easier to share the TM Connections to Anomaly, without sharing the whole TM instance. This can already be done with the new temporal memory implementation, but not so easily with the existing TP.py implementation.

chetan51 commented 9 years ago

@breznak @scottpurdy I'm in favor of passing the TM instance to the Anomaly class for now, since we're at a point where we don't have a strict definition of anomaly computation, and it would be the most flexible approach. Later on, when we settle in on the best anomaly computation method, we can narrow down the interface and only pass what we need. I think this is an okay stop-gap approach.

cogmission commented 9 years ago

Correct me if I'm wrong ok?

If I have it right, proximal and distal refers to dendrites.

Proximal dendrites form the inputs to a given Column - in the old SP, there was only one proximal dendrite per column, this may be increased in the "new" approach. Each dendrite has multiple synapses, you may or may not choose to model the actual synapses and just choose to model the permanence values and connections to the bits of an input vector according to index.

Distal dendrites form the inputs from a given Cell to other Cells. Each of these can have a growing and shrinking set of Synapses each with a permanence value associated.

Regardless of all of this, synapse permanences are associated with a given proximal or distal dendrite and each dendrite has to be associated with either a column or cell (depending on their type). Additionally, each Column is associated with cells (Cells have to exist at least conceptually in order to aggregate and associate distal dendritic synapse permanence values and relate those values to a given Column or cell activation)

So whether the constructs are explicitly modeled or just indexed as array values, they exist conceptually...

If they exist conceptually, then their existence is present in both the SP and TM if you're allowing them to hold state. If they are in a Connections object that in essence separates them from the algorithms - then what you are saying is that the algorithms maintain a reference to their associated data structures within the Connections object. This latter organization is the one you began with the TM.

The SP and TM references to their needed data can be handled in one of two ways if the data is to be separated from the algorithms (which I believe is what you were saying). The Connections (holding all data and connections information) can be passed in to each and every algorithmic method of either the SP or TM, -or- the SP and TM can be "given" a reference to the data structures they required within the Connections object.

The other choice is to conceptually duplicate the structure across the SP and TM. I know that they operate on different data, but the TM still needs something called a Column even if it only is concerned with Cell synapses because its result data comes from the activation of columns and columns are in essence the SDR being passed. So why not just have 1 of them per Layer instead of conceptually duplicating their essence across algorithmic instances? I don't get it? Or I guess I don't agree with it. It seems like a complication that doesn't have to be there and it limits the flexibility and understandability of the structure?

On Fri, Nov 21, 2014 at 1:49 PM, Chetan Surpur notifications@github.com wrote:

@cogmission https://github.com/cogmission We are moving towards using the C++ Connections data structure in all algorithms, so in a sense, we will have the data separated from the algorithms. However, the data an algorithm needs will still be a member of the algorithm instance. Furthermore, with this new approach, the SP and TM both won't operate on the same cell/column structure, but will rather be operating on different kinds of connections (proximal and distal respectively). So they will only store the Connections that they need, and not share it with other algorithm instances.

This will make it easier to share the TM Connections to Anomaly, without sharing the whole TM instance. This can already be done with the new temporal memory implementation, but not so easily with the existing TP.py implementation.

— Reply to this email directly or view it on GitHub https://github.com/numenta/nupic/issues/1306#issuecomment-64027370.

We find it hard to hear what another is saying because of how loudly "who one is", speaks...

cogmission commented 9 years ago

@chetan51 we posted at the same time and I wanted to make sure you didn't miss what I wrote?

cogmission commented 9 years ago

Hey Chetan, I guess this is what's throwing me off:

"However, the data an algorithm needs will still be a member of the algorithm instance."

Does this mean a given algorithm and the Connections object will both reference the same data structure? Or will a copy of the same thing exist in both the Connections object and any given Algorithmic instance?

On Fri, Nov 21, 2014 at 2:24 PM, cogmission1 . cognitionmission@gmail.com wrote:

Correct me if I'm wrong ok?

If I have it right, proximal and distal refers to dendrites.

Proximal dendrites form the inputs to a given Column - in the old SP, there was only one proximal dendrite per column, this may be increased in the "new" approach. Each dendrite has multiple synapses, you may or may not choose to model the actual synapses and just choose to model the permanence values and connections to the bits of an input vector according to index.

Distal dendrites form the inputs from a given Cell to other Cells. Each of these can have a growing and shrinking set of Synapses each with a permanence value associated.

Regardless of all of this, synapse permanences are associated with a given proximal or distal dendrite and each dendrite has to be associated with either a column or cell (depending on their type). Additionally, each Column is associated with cells (Cells have to exist at least conceptually in order to aggregate and associate distal dendritic synapse permanence values and relate those values to a given Column or cell activation)

So whether the constructs are explicitly modeled or just indexed as array values, they exist conceptually...

If they exist conceptually, then their existence is present in both the SP and TM if you're allowing them to hold state. If they are in a Connections object that in essence separates them from the algorithms - then what you are saying is that the algorithms maintain a reference to their associated data structures within the Connections object. This latter organization is the one you began with the TM.

The SP and TM references to their needed data can be handled in one of two ways if the data is to be separated from the algorithms (which I believe is what you were saying). The Connections (holding all data and connections information) can be passed in to each and every algorithmic method of either the SP or TM, -or- the SP and TM can be "given" a reference to the data structures they required within the Connections object.

The other choice is to conceptually duplicate the structure across the SP and TM. I know that they operate on different data, but the TM still needs something called a Column even if it only is concerned with Cell synapses because its result data comes from the activation of columns and columns are in essence the SDR being passed. So why not just have 1 of them per Layer instead of conceptually duplicating their essence across algorithmic instances? I don't get it? Or I guess I don't agree with it. It seems like a complication that doesn't have to be there and it limits the flexibility and understandability of the structure?

On Fri, Nov 21, 2014 at 1:49 PM, Chetan Surpur notifications@github.com wrote:

@cogmission https://github.com/cogmission We are moving towards using the C++ Connections data structure in all algorithms, so in a sense, we will have the data separated from the algorithms. However, the data an algorithm needs will still be a member of the algorithm instance. Furthermore, with this new approach, the SP and TM both won't operate on the same cell/column structure, but will rather be operating on different kinds of connections (proximal and distal respectively). So they will only store the Connections that they need, and not share it with other algorithm instances.

This will make it easier to share the TM Connections to Anomaly, without sharing the whole TM instance. This can already be done with the new temporal memory implementation, but not so easily with the existing TP.py implementation.

— Reply to this email directly or view it on GitHub https://github.com/numenta/nupic/issues/1306#issuecomment-64027370.

We find it hard to hear what another is saying because of how loudly "who one is", speaks...

We find it hard to hear what another is saying because of how loudly "who one is", speaks...

chetan51 commented 9 years ago

The SP algorithm would store the proximal dendrites and permanences of synapses on them (in a Connections data structure), and the TM algorithm would store the distal dendrites and permanences of synapses on them (in a separate Connections data structure). Both Connections data structures represent connections on the same conceptual set of cells / columns, but are stored separately. We don't store the cells / columns explicitly anywhere (other than that the SP / TM algorithms would know the dimensionality of the cell structure and other such things).

chetan51 commented 9 years ago

I am open to the idea of storing the Connections independently of the algorithms, in fact that is why the new TM implementation is written in a "functional" style, taking in the Connections instance in every function. But the way the Network API is currently, it's easiest to store the relevant Connections in the relevant algorithms, and encapsulate that in a Region. This keeps things modular and focused.

cogmission commented 9 years ago

I will try to preserve the same methods and functionality as I have done so far with the Java version, but I want to also keep with the original plan, which was to keep a firm isolation (your idea from the start! :-) ) of the data, state, structure and algorithms. I am hoping because I'm starting from scratch I can preserve the purity of this idea and still mirror the Network API methods or at least the spirit of them so that I can duplicate all tests. The original purpose of this API was to present Numenta with an identical code base that could be "matured" in lockstep with their original codebases - but it's hard to keep from making little "improvements" here and there. I will try though.

Sent from my iPhone

On Nov 21, 2014, at 4:14 PM, Chetan Surpur notifications@github.com wrote:

I am open to the idea of storing the Connections independently of the algorithms, in fact that is why the new TM implementation is written in a "functional" style, taking in the Connections instance in every function. But the way the Network API is currently, it's easiest to store the relevant Connections in the relevant algorithms, and encapsulate that in a Region. This keeps things modular and focused.

— Reply to this email directly or view it on GitHub.