jquesnelle / yarn

YaRN: Efficient Context Window Extension of Large Language Models
MIT License
1.32k stars 115 forks source link

A potential bug in scaled_rope/LlamaDynamicScaledRotaryEmbedding.py #27

Open pengli09 opened 1 year ago

pengli09 commented 1 year ago
  1. The comment "# This if block is unlikely to be run after we build sin/cos in __init__. Keep the logic here just in case." might be incorrect. From what I understand, the code following this comment calculates the scale value based on the actual length of the input. However, the value cached in __init__ is unscaled. Therefore, this branch should be executed frequently.

  2. The new values for cos_cached and sin_cached shouldn't be cached. If they are, after encountering a long sample, all subsequent samples will use the scaled values, regardless of their length.