Closed KTMTL closed 1 month ago
按照提供的代码,PatchEmbeddingStage的输出是x + x_feat,并不是Spike,接着再输入到QK Attention处理,与论文QK Attention的输入不符合,但是我在x + x_feat后加上LIF,精度会下降2%,请问这是为什么?
1、这是sew addition ( activation-before-addition shortcut) [1, 2]的原因,的确会产生脉冲相加的问题, 这种残差的原理上可以通过迭代的方式应用于event-driven的神经芯片,例如:SynSense。 [1] Deep Residual Learning in Spiking Neural Networks (sew addition) [2] Spikformer: When Spiking Neural Network Meets Transformer (sew addtion, rpe addtion)
2、Following SEW resnet和Spikformer,QKFormer在正文实验种使用的sew addition方式。此外也对另外一种ms addition (pre-activation residual shortcut) [3, 4] 构建的模型也在论文做了ablation study实验。 [3] Advancing Spiking Neural Networks Towards Deep Residual Learning [4] Direct training high-performance deep spiking neural networks: a review of theories and methods
按照提供的代码,PatchEmbeddingStage的输出是x + x_feat,并不是Spike,接着再输入到QK Attention处理,与论文QK Attention的输入不符合,但是我在x + x_feat后加上LIF,精度会下降2%,请问这是为什么?