huggingface / alignment-handbook

Robust recipes to align language models with human and AI preferences
https://huggingface.co/HuggingFaceH4
Apache License 2.0
4.28k stars 367 forks source link

Weird DPO loss #46

Open ChenDRAG opened 8 months ago

ChenDRAG commented 8 months ago

Hi, I would like to raise some attention to issue #38.

It seems that the DPO-Lora training loss (red line) drops abruptly at the beginning of each epoch, which seems weird. (I tried Lora model global batch size 64, multi_gpu acceleration, 8GPUs, learning rate 1e-4, others same suggested)

In the mean time, the full parameter fine tunning has no such problem (official settings).

image

I don't know if this is normal and assume this is a bug associated with the lora model. Is there any explanations? Has anyone encountered the same issue? If your rerun loss is normal, can you share your configs?

JhonDan1999 commented 2 months ago

I am experiencing similar behaviour the training loss values show considerable fluctuations as you can see below

Screenshot 2024-05-28 at 10 08 59 AM

here is my code is there something wrong with the training parameters that caused this behaviour:


from trl import DPOTrainer

training_arguments = TrainingArguments(
    output_dir="results",
    num_train_epochs=5,
    per_device_train_batch_size=8,
    gradient_accumulation_steps=4,
    optim='paged_adamw_32bit',
    save_steps=10000000000,
    logging_steps=10,
    learning_rate=2e-4,
    weight_decay=2e-4,
    # fp16 = False,  # Set fp16 to False
    bf16=True,
    max_grad_norm=0.3,
    warmup_ratio=0.03,
    lr_scheduler_type='constant',
    save_strategy = "no",

    gradient_checkpointing=True,
    gradient_checkpointing_kwargs={"use_reentrant":False},
    remove_unused_columns=False
)

import os
os.environ['CUDA_LAUNCH_BLOCKING'] = '1'

trainer = DPOTrainer(
    model=peft_model,

    ref_model=None,
    model_init_kwargs=None,
    ref_model_init_kwargs=None,

    tokenizer=tokenizer,
    args=training_arguments,
    beta=0.1,
    loss_type="sigmoid",
    train_dataset=formatted_train_data,
    eval_dataset=None,  # Provide eval dataset if available
    max_length=max_seq_length,
    peft_config=peft_config,

)

trainer.train()

@lewtun we really need your input here please