qitianwu / DIFFormer

The official implementation for ICLR23 spotlight paper "DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained Diffusion"
302 stars 32 forks source link

attention计算有bug #9

Closed 376498485 closed 1 year ago

376498485 commented 1 year ago

当kernel是simple时,full_attention_conv中的attention这个变量有bug,它的shape与kernel是sigmoid时不一样。 我认为difformer.py中的第43行应该去掉最后面的.unsqueeze(2),这样shape好像就对了。

qitianwu commented 1 year ago

感谢指出,已修正。不过43行这里只用于attention可视化的需求(即需要输出N*N的attention matrix),在模型前馈计算时是不需要的。这里应该是最后整合代码的时候忽视了