Closed XingBin111 closed 8 months ago
The paper claims that global-attention is linear complexity in the number of nodes, but the code(https://github.com/rampasek/GraphGPS/blob/main/graphgps/layer/gps_layer.py#L201) seems to be square complexity. Is it linear or square?
If you use global_model_type as a vanilla Transformer, then yes it's $O(N^2)$.
global_model_type
Just a few lines below you can select Performer for global attention, and Performer is linear
https://github.com/rampasek/GraphGPS/blob/28015707cbab7f8ad72bed0ee872d068ea59c94b/graphgps/layer/gps_layer.py#L205-L206
The paper claims that global-attention is linear complexity in the number of nodes, but the code(https://github.com/rampasek/GraphGPS/blob/main/graphgps/layer/gps_layer.py#L201) seems to be square complexity. Is it linear or square?