Open gaceladri opened 3 years ago
@gaceladri Hi again! I have it built into Performers, you can try it out there and use it as an example https://github.com/lucidrains/performer-pytorch
you would simply apply the rotary embeddings right after
Q = self.feature_map.forward_queries(queries)
K = self.feature_map.forward_keys(keys)
Q, K = apply_rot_emb(Q, K, sinu_emb)
the sinusoidal embeddings must be calculated initially then passed to each attention block
Amazing! You are amazing! Thanks a lot, I will try it!!
@gaceladri make sure to turn off absolute positional embeddings when you try it! it conflicts with rotary for some unknown reason - more research needed
@gaceladri i had trouble making rotary work well with the linear attention in https://github.com/lucidrains/linear-attention , but i suspect its because i'm using the softmax kernel there. it should work well with the elu
kernel :crossed_fingers:
Hello Phil,
Do you mind how to inject the rotary positional embeddings into the linear transformers ?
Thanks!