kaust-generative-ai / local-deployment-llama-cpp

Project to help you get started working with LLMs locally with LLaMA C++.
Apache License 2.0
1 stars 1 forks source link

LoRA adaptor examples #15

Open davidrpugh opened 1 month ago

davidrpugh commented 1 month ago

LLaMA C++ supports using different LoRA adaptors for the same underlying pre-trained model. The following are the relevant llama-cli flags.

-   `--lora FNAME`: Apply a LoRA (Low-Rank Adaptation) adapter to the model (implies --no-mmap). This allows you to adapt the pretrained model to specific tasks or domains.
-   `--lora-base FNAME`: Optional model to use as a base for the layers modified by the LoRA adapter. This flag is used in conjunction with the `--lora` flag, and specifies the base model for the adaptation.
davidrpugh commented 1 month ago

Once we have a LoRA example, then we can also add an example of how to control extended context.

Extended Context Size

Some fine-tuned models have extended the context length by scaling RoPE. For example, if the original pre-trained model has a context length (max sequence length) of 4096 (4k) and the fine-tuned model has 32k. That is a scaling factor of 8, and should work by setting the above --ctx-size to 32768 (32k) and --rope-scale to 8.