Open davidrpugh opened 1 month ago
Once we have a LoRA example, then we can also add an example of how to control extended context.
Some fine-tuned models have extended the context length by scaling RoPE. For example, if the original pre-trained model has a context length (max sequence length) of 4096 (4k) and the fine-tuned model has 32k. That is a scaling factor of 8, and should work by setting the above --ctx-size
to 32768 (32k) and --rope-scale
to 8.
--rope-scale N
: Where N is the linear scaling factor used by the fine-tuned model.
LLaMA C++ supports using different LoRA adaptors for the same underlying pre-trained model. The following are the relevant
llama-cli
flags.