Weird DPO loss - Githubissues

I am experiencing similar behaviour the training loss values show considerable fluctuations as you can see below

here is my code is there something wrong with the training parameters that caused this behaviour:


from trl import DPOTrainer

training_arguments = TrainingArguments(
    output_dir="results",
    num_train_epochs=5,
    per_device_train_batch_size=8,
    gradient_accumulation_steps=4,
    optim='paged_adamw_32bit',
    save_steps=10000000000,
    logging_steps=10,
    learning_rate=2e-4,
    weight_decay=2e-4,
    # fp16 = False,  # Set fp16 to False
    bf16=True,
    max_grad_norm=0.3,
    warmup_ratio=0.03,
    lr_scheduler_type='constant',
    save_strategy = "no",

    gradient_checkpointing=True,
    gradient_checkpointing_kwargs={"use_reentrant":False},
    remove_unused_columns=False
)

import os
os.environ['CUDA_LAUNCH_BLOCKING'] = '1'

trainer = DPOTrainer(
    model=peft_model,

    ref_model=None,
    model_init_kwargs=None,
    ref_model_init_kwargs=None,

    tokenizer=tokenizer,
    args=training_arguments,
    beta=0.1,
    loss_type="sigmoid",
    train_dataset=formatted_train_data,
    eval_dataset=None,  # Provide eval dataset if available
    max_length=max_seq_length,
    peft_config=peft_config,

)

trainer.train()

@lewtun we really need your input here please

huggingface / alignment-handbook

Weird DPO loss #46