Open ArvindSharma18 opened 5 months ago
I'll check this out! So sorry on the issue!
Thanks for such a quick response, appreciate it!
I am having the same issue on my local rtx A4000 rig, just trying a 0.5B Qwen peft... CUDA Out of memory even if it's just using 3GBs / 16GB..
nvm, my issue is related to this
Hello, any updates on this? I am very keen to check different Alignment techniques using Unsloth!
Much apologies, my bro and I relocated to SF, so just back to Github issues! I think Llama-3 in general has a much larger vocab size, so it might be OOMing for DPO / ORPO when compared to Mistral - I could try reducing VRAM usage further, but I would advise reducing max_length = 2048
to something smaller and max_prompt_length = 1024
similarly
Hi, I have the same issue with max_length < 1000
and max_prompt_length = 512
. I have also tried Gemma 2 ( a bigger model ) but again unable to do DPO or ORPO with minimal configs. I am really interested in Llama 3 or Gemma with DPO and ORPO.. Any guidance?
Ye I can reproduce in a free Colab - it seems like there really is a lot of VRAM usage hmmm
Also finding the same issue with ORPO
Hey @danielhanchen! Hope you are doing well! I tried ORPO again with my own curated dataset (<1000) with Qwen 3 b model Q-Lora (unsloth/qwen2.5-3b-bnb-4bit).. same OOM error( Free Colab T4 environment) :( .. I am very keen to see and learn how ORPO works compared to normal SFT.. I know you are very busy and doing such a great job with this repo.. if you find some time, could kindly look into this issue? my config for reference:
orpo_trainer = ORPOTrainer(
model = model,
train_dataset = dataset_train,
tokenizer = tokenizer,
args = ORPOConfig(
max_length = 3512,
max_prompt_length = 512,
max_completion_length = 3000,
per_device_train_batch_size = 2,
gradient_accumulation_steps = 4,
beta = 0.2,
logging_steps = 1,
optim = "adamw_8bit",
lr_scheduler_type = "linear",
num_train_epochs = 1,
fp16 = not is_bfloat16_supported(),
bf16 = is_bfloat16_supported(),
output_dir = "outputs",
),
)
Edit 1: Also tried lower prompt length, minimum I can go is 2048, getting the same issue.
I have followed the Sample Colab with my custom dataset ( < 100 samples ). With the same Configs as in the Sample Colab(loading the model in 4 bit and dtype as None and other configs like Peft and Trainers), I faced OutofMermoryError. Even with the batch size of 1 and some config changes like reducing target modules, the same issue persists.
Environment: Google Colab T4 GPU
Peft Config:
DPO Config:
Error Message for DPO:
Same OOM error for ORPO was observed.