jquesnelle / yarn

YaRN: Efficient Context Window Extension of Large Language Models
MIT License
1.25k stars 110 forks source link

Question related to _yarn_linear_ramp_mask #60

Open chizhang118 opened 2 months ago

chizhang118 commented 2 months ago

I have a question for _yarn_linear_ramp_mask implementation, linear_func = (torch.arange(dim, dtype=torch.float32) - min) / (max - min). For this part, the calculation is based on the dimension rather than num of rotation, but when I checked the paper of defining the ramp function, it seems the r, alpha, beta are all relate to num of rotation rather than dimension since the definition of r(d) = L/lambda, which is the num of rotation comparing with alpha and beta.

So is the implementation the same as the paper statement?

Could anyone help me understand this part?