rampasek / GraphGPS

Recipe for a General, Powerful, Scalable Graph Transformer
MIT License
643 stars 114 forks source link

question about computational complexity in global-attn #41

Closed XingBin111 closed 8 months ago

XingBin111 commented 9 months ago

The paper claims that global-attention is linear complexity in the number of nodes, but the code(https://github.com/rampasek/GraphGPS/blob/main/graphgps/layer/gps_layer.py#L201) seems to be square complexity. Is it linear or square?

migalkin commented 9 months ago

If you use global_model_type as a vanilla Transformer, then yes it's $O(N^2)$.

Just a few lines below you can select Performer for global attention, and Performer is linear

https://github.com/rampasek/GraphGPS/blob/28015707cbab7f8ad72bed0ee872d068ea59c94b/graphgps/layer/gps_layer.py#L205-L206