question about computational complexity in global-attn

rampasek / GraphGPS

Recipe for a General, Powerful, Scalable Graph Transformer

MIT License

643 stars 114 forks source link

Closed XingBin111 closed 8 months ago

XingBin111 commented 9 months ago

The paper claims that global-attention is linear complexity in the number of nodes, but the code(https://github.com/rampasek/GraphGPS/blob/main/graphgps/layer/gps_layer.py#L201) seems to be square complexity. Is it linear or square?

migalkin commented 9 months ago

If you use global_model_type as a vanilla Transformer, then yes it's $O(N^2)$.

Just a few lines below you can select Performer for global attention, and Performer is linear