Open fzyzcjy opened 3 months ago
@fzyzcjy load_in_4bit = False
will enable 16bit!
@danielhanchen Thank you!
So is that everything I need, or anything else that I should take care of if wanting to use lora?
For example, should I use that 4bit bnb model, or should I use the 16bit model (https://huggingface.co/meta-llama/Meta-Llama-3.1-8B)? (I guess the latter) And I see qlora seems to have multiple optimizations like compressing things. So to use lora, should I somehow disable some other flags?
It would be great to have a brief doc explaining this (or even a notebook), since I guess not everyone using this package is experts ;) I am happy to PR if needed.
More details: I changed
model_name="unsloth/Meta-Llama-3.1-8B-bnb-4bit", load_in_4bit=True,
to
model_name='meta-llama/Meta-Llama-3.1-8B', load_in_4bit=False,
and run it. Surprisingly the speed of lora is almost equivalent to qlora (thus I suspect maybe I do something wrong, since lora is said to be often faster...).
p.s. both use
per_device_train_batch_size=4, gradient_accumulation_steps=4,
Oh no worries on model name changes - we handle that internally - and yes no speed changes - probs LoRA might be a bit faster since no dequantization is needed
Thank you!
and yes no speed changes
Hmm I am bit confused... e.g. https://cloud.google.com/vertex-ai/generative-ai/docs/model-garden/lora-qlora says " LoRA is about 66% faster than QLoRA in terms of tuning speed", etc.
This is interesting, I was also looking same, this is my config, do I need to change anything else? optim: "adamw_8bit" ?
# Model configuration
model:
name: "meta-llama/Meta-Llama-3.1-8B"
max_seq_length: 1000
load_in_4bit: flase
# LoRA configuration
lora:
target_modules:
- "q_proj"
- "k_proj"
- "v_proj"
- "o_proj"
- "gate_proj"
- "up_proj"
- "down_proj"
alpha: 64
dropout: 0
rank: 64
use_rslora: false
loftq_config: null
# Training configuration
training:
per_device_train_batch_size: 2
gradient_accumulation_steps: 2
warmup_ratio: 0.1
num_train_epochs: 2
learning_rate: 5e-6
optim: "adamw_8bit"
lr_scheduler_type: "linear"
logging_steps: 1
weight_decay: 0.0
max_length: 512
max_prompt_length: 512
use_gradient_checkpointing: "unsloth"
bias: "none"
beta: 0.1
seed: 42
output_dir: "outputs"
# Dataset configuration
dataset:
sources:
"huggingface/default_data": 1.0
splits:
- "train"
- "test"
num_proc: 12
# Random seed
seed: 42```
Hi thanks for the package! I want to play with LoRA on llama3.1 8B, but the tutorials https://docs.unsloth.ai/get-started/unsloth-notebooks seems only to discuss with qlora. Thus I wonder what to do for lora?
My guess: In
FastLanguageModel.from_pretrained
, load a 16bit instead of a 4bit one. But is there anything else I should do?