numenta / nupic-legacy

Numenta Platform for Intelligent Computing is an implementation of Hierarchical Temporal Memory (HTM), a theory of intelligence based strictly on the neuroscience of the neocortex.
http://numenta.org/
GNU Affero General Public License v3.0
6.33k stars 1.56k forks source link

Improve permanance initialization scheme for spatial pooler #3233

Open ywcui1990 opened 8 years ago

ywcui1990 commented 8 years ago

@subutai

In the spatial pooler, synaptic permanence should be initialized such that a small number of learning steps could make it disconnected or connected. This idea is described in the doc https://github.com/numenta/nupic/blob/master/src/nupic/research/spatial_pooler.py#L1130

However, the function _initPermConnected and _initPermNonConnected does not achieve what is described in the doc. Connected permanence values are uniformly distributed between the connected threshold and the maximum permanence. With the default values of synPermConnected=0.1, synPermMax=1.0, many connected synapses were well above the connected threshold. During learning, it could take hundred of learning steps to get these connected synapses below threshold (see experiment results here https://github.com/numenta/nupic.research/pull/586).

I suggest to initialize the permanence values differently. Instead of a uniform distribution, use a Gaussian distribution centered around the connected threshold, with a standard deviation proportional to synPermActiveInc and synPermInactiveDec respectively. This will ensure that a small number of steps can make a synapse connected/disconnected.

subutai commented 8 years ago

I like this idea. In order to make this change we need to validate performance on our various benchmark datasets, including Taxi and NAB.

Also, why Gaussian? Gaussian can still leave a significant number of permanence values far from connected. Why not a small uniform range around synPermConnected? That would guarantee the property.

ywcui1990 commented 8 years ago

@subutai I tried to initialize the permanence with a uniform distribution in [synPermConnected, synPermConnected + 5 * synPermInactiveDec] for connected synapses and [synPermConnected - 5 * synPermActiveInc, synPermConnected] for unconnected synapses.

This ensures that a small number of iterations (<5) will make a synapse connected/unconnected.

This change does lead to faster convergence for my random SDR classification experiment as expected. However, I noticed that the performance on the taxi datasets got significantly worse. I am still investigating the reason for that. It seems that a few columns quickly dominate the activation somehow after the change: the activeDutyCycle distribution has a long tail, suggesting not all columns are used.

subutai commented 8 years ago

Hmm, I guess initializing this way also makes the columns more unstable since it can quickly change. Did you also try your original Gaussian proposal - maybe that will be more stable?

ywcui1990 commented 8 years ago

It turns out the bad performance was mainly due to improper parameters rather than the initialization scheme. I had synPermActiveInc=0.0001 and synPermInactiveDec=0.0005 before: the decrement is 5 times larger than the increment. This was fine with the old initialization scheme which has many strong connections. With the new initialization scheme, I find that the decrement has to be smaller than the increment for SP to learn.

Basically, I found the following combinations work (with similar performance)

So we just have to tune the parameters differently with different initialization schemes. As I think more about it, I feel it doesn't make sense to have decrement > increment. If there is large set of inputs (>50), a column has to respond to multiple different inputs. If decrement > increment, it will forget previously learned inputs when learning inputs. I found that the SP ends up with not enough connections after long simulations.