lucidrains / x-transformers

A concise but complete full-attention transformer with a set of promising experimental features from various papers
MIT License
4.63k stars 395 forks source link

[Question] Why is RotaryEmbedding not used when cross attending? #258

Open pfeatherstone opened 2 months ago

pfeatherstone commented 2 months ago

Why is RotaryEmbedding not used when cross attending?

https://github.com/lucidrains/x-transformers/blob/80be13468065a720fb5ca92cb6d6dbcf5a204913/x_transformers/x_transformers.py#L1095-L1103

Jourdelune commented 1 month ago

Hey, maybe related to this: https://github.com/lucidrains/x-transformers/issues/38. Maybe it should raise an error instead of disabling it.