LoRA adaptor examples - Githubissues

kaust-generative-ai / local-deployment-llama-cpp

Project to help you get started working with LLMs locally with LLaMA C++.

Apache License 2.0

1 stars 1 forks source link

- `--lora FNAME`: Apply a LoRA (Low-Rank Adaptation) adapter to the model (implies --no-mmap). This allows you to adapt the pretrained model to specific tasks or domains. - `--lora-base FNAME`: Optional model to use as a base for the layers modified by the LoRA adapter. This flag is used in conjunction with the `--lora` flag, and specifies the base model for the adaptation.

Once we have a LoRA example, then we can also add an example of how to control extended context.

Extended Context Size

Some fine-tuned models have extended the context length by scaling RoPE. For example, if the original pre-trained model has a context length (max sequence length) of 4096 (4k) and the fine-tuned model has 32k. That is a scaling factor of 8, and should work by setting the above --ctx-size to 32768 (32k) and --rope-scale to 8.

--rope-scale N: Where N is the linear scaling factor used by the fine-tuned model.

kaust-generative-ai / local-deployment-llama-cpp

LoRA adaptor examples #15

Extended Context Size