Llama 3 rope theta - Githubissues

nvtransfer / RULER

This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?

Apache License 2.0

646 stars 43 forks source link

Llama 3 rope theta #6

Closed ganler closed 5 months ago

ganler commented 5 months ago

Thanks for the great work!

From the README:

The results are evaluated by changing rope_theta to 16M in here.

Can I know the reason for adjusting rope_theta here rather than directly using say dynamic rope scaling? Thanks!

hsiehjackson commented 5 months ago

Hi @ganler, we simply change the rope_theta to 16M by following this post. It would be interesting to use dynamic rope scaling without training for Llama3 models. We'll consider adding the results later.

ganler commented 5 months ago

Thank you! We tried to use dynamic RoPE scaling and it significantly improved Llama3 models (https://evalplus.github.io/repoqa.html).

Do you have any hints of why using 16M rope theta can also work much better? Thanks!

hsiehjackson commented 5 months ago

Not sure what dynamic RoPE scaling techniques you are referring to. In Hugginface, we have dynamic NTK scaling here to dynamically increase the RoPE base based on the input sequence length, which is similar to directly change rope theta with a large value. For why using a large base is useful, there are plenty papers investigating some tricks to change RoPE. https://arxiv.org/pdf/2309.16039 https://arxiv.org/pdf/2310.05209 https://arxiv.org/pdf/2309.00071 https://arxiv.org/pdf/2402.13753

ganler commented 5 months ago

Dynamic RoPE scaling: https://www.reddit.com/r/LocalLLaMA/comments/14mrgpr/dynamically_scaled_rope_further_increases/

basically dynamically adjusting the scale factor to context_len/model_len when context_len > model_len. It seems to be the same thing your code is showing.

dynamically increase the RoPE base based on the input sequence length, which is similar to directly change rope theta with a large value

I don't quite see the similarity but thanks for the references and I will check it. Thank you!