输入到QK Attention时，并不是Spike的形式

KTMTL commented 1 month ago

按照提供的代码，PatchEmbeddingStage的输出是x + x_feat，并不是Spike，接着再输入到QK Attention处理，与论文QK Attention的输入不符合，但是我在x + x_feat后加上LIF，精度会下降2%，请问这是为什么？

KTMTL commented 1 month ago

微信截图_20241014105235 b7fab17de0ade76462c5e38871457a4

zhouchenlin2096 commented 1 month ago

按照提供的代码，PatchEmbeddingStage的输出是x + x_feat，并不是Spike，接着再输入到QK Attention处理，与论文QK Attention的输入不符合，但是我在x + x_feat后加上LIF，精度会下降2%，请问这是为什么？

1、这是sew addition ( activation-before-addition shortcut) [1, 2]的原因，的确会产生脉冲相加的问题, 这种残差的原理上可以通过迭代的方式应用于event-driven的神经芯片，例如：SynSense。 [1] Deep Residual Learning in Spiking Neural Networks （sew addition） [2] Spikformer: When Spiking Neural Network Meets Transformer (sew addtion, rpe addtion)

2、Following SEW resnet和Spikformer，QKFormer在正文实验种使用的sew addition方式。此外也对另外一种ms addition (pre-activation residual shortcut) [3, 4] 构建的模型也在论文做了ablation study实验。 [3] Advancing Spiking Neural Networks Towards Deep Residual Learning [4] Direct training high-performance deep spiking neural networks: a review of theories and methods

zhouchenlin2096 / QKFormer

输入到QK Attention时，并不是Spike的形式 #5