What's the difference between RGA and ordinary attention module?

It seems like RGA module is different from ordinary attention module, like the one used in ABD-Net. In your article, the main advantage of RGA is that we can extract global relation information with less computational costs. However, spatial attention module and channel attention module in ABD-Net seem to be more efficient since there's no extra computational costs on feature embedding. Could you tell me what's the advantage of RGA over ordinary attention module?

20200811201340578

microsoft / Relation-Aware-Global-Attention-Networks

What's the difference between RGA and ordinary attention module? #16