Open ChenDRAG opened 8 months ago
I am experiencing similar behaviour the training loss values show considerable fluctuations as you can see below
here is my code is there something wrong with the training parameters that caused this behaviour:
from trl import DPOTrainer
training_arguments = TrainingArguments(
output_dir="results",
num_train_epochs=5,
per_device_train_batch_size=8,
gradient_accumulation_steps=4,
optim='paged_adamw_32bit',
save_steps=10000000000,
logging_steps=10,
learning_rate=2e-4,
weight_decay=2e-4,
# fp16 = False, # Set fp16 to False
bf16=True,
max_grad_norm=0.3,
warmup_ratio=0.03,
lr_scheduler_type='constant',
save_strategy = "no",
gradient_checkpointing=True,
gradient_checkpointing_kwargs={"use_reentrant":False},
remove_unused_columns=False
)
import os
os.environ['CUDA_LAUNCH_BLOCKING'] = '1'
trainer = DPOTrainer(
model=peft_model,
ref_model=None,
model_init_kwargs=None,
ref_model_init_kwargs=None,
tokenizer=tokenizer,
args=training_arguments,
beta=0.1,
loss_type="sigmoid",
train_dataset=formatted_train_data,
eval_dataset=None, # Provide eval dataset if available
max_length=max_seq_length,
peft_config=peft_config,
)
trainer.train()
@lewtun we really need your input here please
Hi, I would like to raise some attention to issue #38.
It seems that the DPO-Lora training loss (red line) drops abruptly at the beginning of each epoch, which seems weird. (I tried Lora model global batch size 64, multi_gpu acceleration, 8GPUs, learning rate 1e-4, others same suggested)
In the mean time, the full parameter fine tunning has no such problem (official settings).
I don't know if this is normal and assume this is a bug associated with the lora model. Is there any explanations? Has anyone encountered the same issue? If your rerun loss is normal, can you share your configs?