lucidrains / En-transformer

Implementation of E(n)-Transformer, which incorporates attention mechanisms into Welling's E(n)-Equivariant Graph Neural Network
MIT License
208 stars 28 forks source link

On rotary embeddings #3

Closed chaitjo closed 1 year ago

chaitjo commented 3 years ago

Hi @lucidrains, thank you for your amazing work; big fan! I had a quick question on the usage of this repository.

Based on my understanding, rotary embeddings are a drop-in replacement for the original sinusoidal or learnt PEs in Transformers for sequential data, as in NLP or other temporal applications. If my application is not on sequential data, is there a reason why I should still use rotary embeddings?

E.g. for molecular datasets such as QM9 (from the En-GNNs paper), would it make sense to have rotary embeddings?

hypnopump commented 3 years ago

Hi there!

I think in principle there's no reason to use them, although if you try, pls report your results.

Best, Eric