Improve permanance initialization scheme for spatial pooler

ywcui1990 commented 8 years ago

@subutai

In the spatial pooler, synaptic permanence should be initialized such that a small number of learning steps could make it disconnected or connected. This idea is described in the doc https://github.com/numenta/nupic/blob/master/src/nupic/research/spatial_pooler.py#L1130

However, the function _initPermConnected and _initPermNonConnected does not achieve what is described in the doc. Connected permanence values are uniformly distributed between the connected threshold and the maximum permanence. With the default values of synPermConnected=0.1, synPermMax=1.0, many connected synapses were well above the connected threshold. During learning, it could take hundred of learning steps to get these connected synapses below threshold (see experiment results here https://github.com/numenta/nupic.research/pull/586).

I suggest to initialize the permanence values differently. Instead of a uniform distribution, use a Gaussian distribution centered around the connected threshold, with a standard deviation proportional to synPermActiveInc and synPermInactiveDec respectively. This will ensure that a small number of steps can make a synapse connected/disconnected.

subutai commented 8 years ago

I like this idea. In order to make this change we need to validate performance on our various benchmark datasets, including Taxi and NAB.

Also, why Gaussian? Gaussian can still leave a significant number of permanence values far from connected. Why not a small uniform range around synPermConnected? That would guarantee the property.

ywcui1990 commented 8 years ago

@subutai I tried to initialize the permanence with a uniform distribution in [synPermConnected, synPermConnected + 5 * synPermInactiveDec] for connected synapses and [synPermConnected - 5 * synPermActiveInc, synPermConnected] for unconnected synapses.

This ensures that a small number of iterations (<5) will make a synapse connected/unconnected.

This change does lead to faster convergence for my random SDR classification experiment as expected. However, I noticed that the performance on the taxi datasets got significantly worse. I am still investigating the reason for that. It seems that a few columns quickly dominate the activation somehow after the change: the activeDutyCycle distribution has a long tail, suggesting not all columns are used.

subutai commented 8 years ago

Hmm, I guess initializing this way also makes the columns more unstable since it can quickly change. Did you also try your original Gaussian proposal - maybe that will be more stable?

ywcui1990 commented 8 years ago

It turns out the bad performance was mainly due to improper parameters rather than the initialization scheme. I had synPermActiveInc=0.0001 and synPermInactiveDec=0.0005 before: the decrement is 5 times larger than the increment. This was fine with the old initialization scheme which has many strong connections. With the new initialization scheme, I find that the decrement has to be smaller than the increment for SP to learn.

Basically, I found the following combinations work (with similar performance)

For old initialization scheme, synPermInactiveDec should be greater than synPermActiveInc in order to remove the many redundant connections.
For the new initialization scheme, synPermInactiveDec should be comparable or smaller than synPermActiveInc for SP to learn.

So we just have to tune the parameters differently with different initialization schemes. As I think more about it, I feel it doesn't make sense to have decrement > increment. If there is large set of inputs (>50), a column has to respond to multiple different inputs. If decrement > increment, it will forget previously learned inputs when learning inputs. I found that the SP ends up with not enough connections after long simulations.

numenta / nupic-legacy

Improve permanance initialization scheme for spatial pooler #3233