About Attention Calculation

qitianwu / DIFFormer

The official implementation for ICLR23 spotlight paper "DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained Diffusion"

302 stars 32 forks source link

Closed demo4ai closed 1 year ago

demo4ai commented 1 year ago

Thanks for your sharing code ! I am currently encountering some issues.

I tried calling the function full_attention_conv

x = torch.randn(25, 4, 16) a = full_attention_conv(x, x, x, kernel='simple', output_attn=True)

It will report an error RuntimeError: The size of tensor a (25) must match the size of tensor b (4) at non-singleton dimension 1

qitianwu commented 1 year ago

Hi, the reason could be that you set the num_heads=1 while the input x suggests the head number should be 4