guillaume-chevalier / Spiking-Neural-Network-SNN-with-PyTorch-where-Backpropagation-engenders-STDP

What about coding a Spiking Neural Network using an automatic differentiation framework? In SNNs, there is a time axis and the neural network sees data throughout time, and activation functions are instead spikes that are raised past a certain pre-activation threshold. Pre-activation values constantly fades if neurons aren't excited enough.
https://guillaume-chevalier.com/spiking-neural-network-snn-with-pytorch-where-backpropagation-engenders-stdp-hebbian-learning/
254 stars 47 forks source link

To spike or not to spike #1

Open RR5555 opened 5 years ago

RR5555 commented 5 years ago

Small disclaimer: I am yet another phd student whose main scope of research happened to be SNN. What I am to say below is only based on what I currently think I know. I might be very wrong, so please do not judge me too harshly. Also, when speaking science, I am quite raw, please do not take any offence. None is intended.


Spiking neurons can encompass a lot of different models. Whether or not what is important is the fact there is a spike or that a spike have a specific shape is still up for many debates. In computational neuroscience, working with models which model the spike shape is quite common. However, on the learning side and deep learning side, SNN seems to be used mostly for models in which the only information transitioning between neurons are all-or-none timely bits. From that later perspective, what you coded is not really an SNN (even thought, from a computational neuroscience perspective, it could be discussed, and could perhaps be treated as a kind of spiking model). (Still from that later perspective, you would have to replace your relu by a heaviside, that is to output 'do_penalize_gate' instead of 'outer_excitation' to be in-tune with what is usually implied as SNN.)

On the computational neuroscience side, the problems are many-fold:

All that to say that the debate is still open, and that in an extended way, what you coded could ''maybe'' be considered as an SNN. And how exactly real neurons learn is still not totally clear, including the question of the substrate of the learning system (only the neurons? neurons + glial cells? neuron membranes? Chemical pathways? DNA -what if your learning rule is a conditional code on a small patch of DNA, triggered through chemical pathways which were resulting from the membrane voltage local activations? - implicitly, this is also the question of the scale at which the information is ''really'' processed and stored, and at which the ''intelligence'' is-).

Now, in the deep learning papers, and learning models of SNN, SNNs are more to refer to all-or-none bit transmissions between neurons. In these, there are already papers using some kind of exponential kernel to enable back-propagation in SNN, and others, with other spiking neuron model (IF, LIF, ...), other kernels, different resets (by subtraction, to zero,...) using possibly back-prop or other mechanism, some converting from ANN to SNN, etc. Because the more advanced papers and thesis on the subject require a tad more mathematics, I did not go too deep yet in the more complex papers. (Currently focusing on catching up on the literature: Izhikevitch, Dayan, Abott, Gerstner, Ermentrout, Rieke, Bialek, Florescu) However, I am fairly certain that some papers have already drawn and put into equations links between STDP and back-prop using at least specific kernels: e.g. 'BP-STDP: Approximating Backpropagation using Spike Timing Dependent Plasticity' or even by Bengio himself 'STDP-Compatible Approximation of Backpropagationin an Energy-Based Model'. Warning: I did not read yet the aforementioned papers right above, they are mainly here for the sake of demonstrating the point, though these two are on my to-read list especially the second one, and others dealing with energy-based models.

About the SNN not being a RNN, again tricky depending on what you define to be a SNN. But the version you implemented, for me is most definitely a RNN. I could draw a NN graph with recurrent connections. A RNN is simply a NN with loops, it does not say anything about where the loop should be. Once you have a loop, you have a notion of a stored state (a memory system) which is kind of what the membrane voltage of simplified spiking neuron models also enables. If you go for a Hodgkin-Huxley model, then this is not a RNN, a RNN being a discrete system whereas the description of the HH model is continuous through a system of differential equations.

On the brain rhythms, you might give a look at Jun Tani's work on Multiple-timescale NN such as in: 'Emergence of Functional Hierarchy in a Multiple-Timescale Neural Network Model: A Humanoid Robot Experiment'. Also, though not yet there in my reading, you might have a look at neural oscillations in SNN (but these are usually the latest chapters of the spiking neuron books). Jun Tani's work is based in a big part on Predictive Coding -learning from the divergence between what the system's model of the world predicts and the real sensory pieces of information or their representations-, which is not the directly the same as auto-encoders plus CD. But is it that different? (True question, I am not good enough in auto-encoders and CD to be able to answer that yet.)

Last but not least, you can still check if your specific implementation and ideas have been put into a paper, and if they had not been, develop the mathematical justification for the BP-STDP link in that peculiar configuration of yours, or try experimentally your other ideas, and check if these had been published or not. My point here is in no way to stop your work on that (the more you develop new things, the better for me and others), but just merely to try to give you a feedback which I hope might be of any use to you.

guillaume-chevalier commented 5 years ago

Haha, "yet another PhD student", hilarious. Thanks for the feedback!

You say "you would have to replace your relu by a heaviside": you are totally right. But the heaviside's derivative is 0 so I absolutely needed something else. Perhaps a heaviside with a tricked derivative of 1 could do the job?

For the RNN thing, yeah, it's debatable. I prefer not to call this an RNN as it's not literally connected to itself with another weight "w", although I see how you could say it's somehow like an RNN. I'd like to see the SNN as not a discrete system (e.g.: it could be implemented with a thread per neuron and the times at which firing happen would be the only events processed by the threads, and launching an update of each post-synaptic threads it's connected to. Those "events" could happen at any continuous time value. Although thinking like that is a bit pixelated, it'd be quite energy-efficient in computers.

yaceben commented 4 years ago

Hi,

As a previous PhD student that stubbornly missed the boat on all the Deep Learning hype, preferring SNNs, but nonetheless staying in the classical stochastic world, I thought that this thesis was quite interesting at the time. Haven't read it in a while but maybe some concepts could be borrowed here.

Bekolay, T. (2016). Biologically Inspired Methods in Speech Recognition and Synthesis: Closing the Loop (Phdthesis, University of Waterloo). Retrieved from https://uwspace.uwaterloo.ca/handle/10012/10269

Cheers