huggingface / nanotron

Minimalistic large language model 3D-parallelism training
Apache License 2.0
1.14k stars 107 forks source link

add rope_theta config var for llama #173

Closed jquesnelle closed 4 months ago

jquesnelle commented 4 months ago

This allows changing RoPE's theta hyperparameter for Llama models. For example, Llama 3 uses theta = 500000 instead of the default 10000.

xrsrke commented 4 months ago

Thanks for the PR. Merged!