right now our implementation of RoPE assumes the rotation matrix is created and used in the HuggingFace model code way, i.e. instead of the rotation matrix described in original RoPE paper https://arxiv.org/pdf/2104.09864, we assume it looks something like this instead:
🚀 The feature, motivation and pitch
right now our implementation of RoPE assumes the rotation matrix is created and used in the HuggingFace model code way, i.e. instead of the rotation matrix described in original RoPE paper https://arxiv.org/pdf/2104.09864, we assume it looks something like this instead:
We should also support use cases where people create their RoPE cos & sin buffers following the original formula.
Alternatives
We may need to consider the complex form too (i.e. what official meta llama code is doing https://github.com/meta-llama/llama/blob/6c7fe276574e78057f917549435a2554000a876d/llama/model.py#L64-L74)
Additional context
No response