Base Llama models use 10000 for RoPE theta. Code Llama models use 1000000 for dealing with a larger context.
With the current hardcoded value, the code models tend to close extra parentheses, whereas they work better if it is changed to 1000000 in run.c
This may need to be added to Config but it would mean introducing incompatibility in the bin files (there are some other fields that may need to be there such as ffn_dim_multiplier which is 1.3 for the CodeLlama-13 models)
Or it can be solved at inference time only by adding a new rope_theta arg to run.c
Base Llama models use 10000 for RoPE theta. Code Llama models use 1000000 for dealing with a larger context. With the current hardcoded value, the code models tend to close extra parentheses, whereas they work better if it is changed to 1000000 in run.c
This may need to be added to Config but it would mean introducing incompatibility in the bin files (there are some other fields that may need to be there such as ffn_dim_multiplier which is 1.3 for the CodeLlama-13 models)
Or it can be solved at inference time only by adding a new rope_theta arg to run.c