Linking paper and code - Githubissues

ridgerchu / SpikeGPT

Implementation of "SpikeGPT: Generative Pre-trained Language Model with Spiking Neural Networks"

BSD 2-Clause "Simplified" License

746 stars 77 forks source link

Hi!

For the first issue, please refer to Fig.2. In this structure, 'W' can be viewed as a convolution kernel of the same size as 'K/V', enabling convolution operations for parallel processing. We have an animation that illustrates this concept, and you can find it in this talk record between 25:00 to 28:00, where I discuss this particular problem in detail.

Regarding the second point, language models often encounter overflow issues during training. To address this, we introduced a new variable 'pp' to manage these overflow concerns. Nevertheless, the overall functionality aligns with what is described in Eq.10. For a more comprehensive explanation, please consult our RWKV EMNLP paper, particularly Eq.23-28, where this is thoroughly discussed.

ridgerchu / SpikeGPT

Linking paper and code #11