unslothai / unsloth

Finetune Llama 3.2, Mistral, Phi, Qwen 2.5 & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
18.16k stars 1.27k forks source link

Correct way to do LoRA instead of QLoRA? #858

Open fzyzcjy opened 3 months ago

fzyzcjy commented 3 months ago

Hi thanks for the package! I want to play with LoRA on llama3.1 8B, but the tutorials https://docs.unsloth.ai/get-started/unsloth-notebooks seems only to discuss with qlora. Thus I wonder what to do for lora?

My guess: In FastLanguageModel.from_pretrained, load a 16bit instead of a 4bit one. But is there anything else I should do?

danielhanchen commented 3 months ago

@fzyzcjy load_in_4bit = False will enable 16bit!

fzyzcjy commented 3 months ago

@danielhanchen Thank you!

So is that everything I need, or anything else that I should take care of if wanting to use lora?

For example, should I use that 4bit bnb model, or should I use the 16bit model (https://huggingface.co/meta-llama/Meta-Llama-3.1-8B)? (I guess the latter) And I see qlora seems to have multiple optimizations like compressing things. So to use lora, should I somehow disable some other flags?

It would be great to have a brief doc explaining this (or even a notebook), since I guess not everyone using this package is experts ;) I am happy to PR if needed.

fzyzcjy commented 3 months ago

More details: I changed

model_name="unsloth/Meta-Llama-3.1-8B-bnb-4bit", load_in_4bit=True,

to

model_name='meta-llama/Meta-Llama-3.1-8B', load_in_4bit=False,

and run it. Surprisingly the speed of lora is almost equivalent to qlora (thus I suspect maybe I do something wrong, since lora is said to be often faster...).

p.s. both use

per_device_train_batch_size=4, gradient_accumulation_steps=4,
danielhanchen commented 3 months ago

Oh no worries on model name changes - we handle that internally - and yes no speed changes - probs LoRA might be a bit faster since no dequantization is needed

fzyzcjy commented 3 months ago

Thank you!

and yes no speed changes

Hmm I am bit confused... e.g. https://cloud.google.com/vertex-ai/generative-ai/docs/model-garden/lora-qlora says " LoRA is about 66% faster than QLoRA in terms of tuning speed", etc.

monk1337 commented 2 months ago

This is interesting, I was also looking same, this is my config, do I need to change anything else? optim: "adamw_8bit" ?


# Model configuration
model:
  name: "meta-llama/Meta-Llama-3.1-8B"
  max_seq_length: 1000
  load_in_4bit: flase

# LoRA configuration
lora:
  target_modules:
    - "q_proj"
    - "k_proj"
    - "v_proj"
    - "o_proj"
    - "gate_proj"
    - "up_proj"
    - "down_proj"
  alpha: 64
  dropout: 0
  rank: 64
  use_rslora: false
  loftq_config: null

# Training configuration
training:
  per_device_train_batch_size: 2
  gradient_accumulation_steps: 2
  warmup_ratio: 0.1
  num_train_epochs: 2
  learning_rate: 5e-6
  optim: "adamw_8bit"
  lr_scheduler_type: "linear"
  logging_steps: 1
  weight_decay: 0.0
  max_length: 512
  max_prompt_length: 512
  use_gradient_checkpointing: "unsloth"
  bias: "none"
  beta: 0.1
  seed: 42
  output_dir: "outputs"

# Dataset configuration
dataset:
  sources:
    "huggingface/default_data": 1.0
  splits:
    - "train"
    - "test"
  num_proc: 12

# Random seed
seed: 42```