karpathy / llama2.c

Inference Llama 2 in one file of pure C
MIT License
17.34k stars 2.06k forks source link

Code Llama rope_theta parameter #356

Open janimo opened 1 year ago

janimo commented 1 year ago

Base Llama models use 10000 for RoPE theta. Code Llama models use 1000000 for dealing with a larger context. With the current hardcoded value, the code models tend to close extra parentheses, whereas they work better if it is changed to 1000000 in run.c

This may need to be added to Config but it would mean introducing incompatibility in the bin files (there are some other fields that may need to be there such as ffn_dim_multiplier which is 1.3 for the CodeLlama-13 models)

Or it can be solved at inference time only by adding a new rope_theta arg to run.c

rdentato commented 1 year ago

SInce we are migrating to a new file format, should the RoPE tetha parameter be stored there?

karpathy commented 1 year ago

yep exactly, the v1+ header is large enough to incorporate additional hyperparameters like this.