zhouchenlin2096 / QKFormer

Offical code of "QKFormer: Hierarchical Spiking Transformer using Q-K Attention" (NeurIPS 2024,Spotlight 3%)
74 stars 2 forks source link

输入到QK Attention时,并不是Spike的形式 #5

Closed KTMTL closed 1 month ago

KTMTL commented 1 month ago

按照提供的代码,PatchEmbeddingStage的输出是x + x_feat,并不是Spike,接着再输入到QK Attention处理,与论文QK Attention的输入不符合,但是我在x + x_feat后加上LIF,精度会下降2%,请问这是为什么?

KTMTL commented 1 month ago

微信截图_20241014105235 b7fab17de0ade76462c5e38871457a4

zhouchenlin2096 commented 1 month ago

按照提供的代码,PatchEmbeddingStage的输出是x + x_feat,并不是Spike,接着再输入到QK Attention处理,与论文QK Attention的输入不符合,但是我在x + x_feat后加上LIF,精度会下降2%,请问这是为什么?

1、这是sew addition ( activation-before-addition shortcut) [1, 2]的原因,的确会产生脉冲相加的问题, 这种残差的原理上可以通过迭代的方式应用于event-driven的神经芯片,例如:SynSense。 [1] Deep Residual Learning in Spiking Neural Networks (sew addition) [2] Spikformer: When Spiking Neural Network Meets Transformer (sew addtion, rpe addtion)

2、Following SEW resnet和Spikformer,QKFormer在正文实验种使用的sew addition方式。此外也对另外一种ms addition (pre-activation residual shortcut) [3, 4] 构建的模型也在论文做了ablation study实验。 [3] Advancing Spiking Neural Networks Towards Deep Residual Learning [4] Direct training high-performance deep spiking neural networks: a review of theories and methods