Pytorch-Geometric Version Attention code is buggy

First of all - this is a great repo and thank you for this. The pyg version however has some bugs with the attention.

Just a few that I have encountered:

In forward method attention layer is at index -1 not 0 and EGNN layer is index 0 not -1 (which is the opposite in the other implementation).
self.global_tokens init has undefined var dim
Uses GlobalLinearAttention from other implementation although GlobalLinearAttention_Sparse is defined in the file (not sure if this is a bug or on purpose?

I have refactored a lot of the code, but can try and do a PR in a few days

lucidrains / egnn-pytorch