Open manisnesan opened 3 months ago
Answer.ai post - You can train 70B param model using FSDP and QLora
LoRA - Low rank adapters. They are basically small matrices. Keeping the rest of the model as constant, only train these small matrices
Intent is everybody need to contribute to the creation of models
LoRA doesn’t train the whole large language model at all, but instead adds “adaptors”, which are very small matrices (generally smaller than 1% of the full model) that are trained, whilst keeping the rest of the model constant
Keeping the base model as quantized ( frozen during training) keep the adapters unquantized
Tim realized that LoRA can be combined with quantization: use a quantized base model, which is not changed at all by the training, and add trainable LoRA adaptors that are not quantized. This combination is QLoRA
PEFT
Parameter Efficient Fine Tuning- PEFT approaches enable you to get performance comparable to full fine-tuning while only having a small number of trainable parameters.
Fine tune minimal expample using QLORA - Colab
Fine tune using Unsloth with Colab Examples
Very few lines of code + GPU poor friendly + Good performance
Fine tune your first LLM using torch tune
Reference: https://github.com/pytorch/torchtune
Source : Andrej tweet
Source : Maxime labonne post & another post
Fine tune a gpt2 model for spam classification
https://github.com/rasbt/LLMs-from-scratch/blob/main/ch06/01_main-chapter-code/ch06.ipynb
Extended Guide: Instruction-tune Llama 2 - https://www.philschmid.de/instruction-tune-llama-2
Toc 1. Define the use case and create a prompt template for instructions 2. Create an instruction dataset 3. Instruction-tune Llama 2 using trl and the SFTTrainer Flash Attention 4. Test Model and run Inference
https://lightning.ai/pages/community/finetuning-falcon-efficiently/