Multi-GPU Training with DPO Full Parameter Stucks

Environment:

transformers: 4.39.0.dev0 trl: 0.7.10 torch: 2.2.2 8 x H100 (80GB)

I am encountering an issue where the training process with DPO on a multi-GPU setup gets stuck. This problem arises when I attempt to launch the training using the accelerate CLI with DeepSpeed's ZeRO-3 configuration.

Steps to Reproduce:

Clone the Alignment Handbook repository:

git clone https://github.com/huggingface/alignment-handbook.git
cd alignment-handbook

Install dependencies:

pip install wheel
python -m pip install .

Launch the training script with the specified configuration:

ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/deepspeed_zero3.yaml scripts/run_dpo.py recipes/zephyr-7b-beta/dpo/config_full.yaml

Expected vs. Actual Behavior: Expected: Smooth utilization of multi-GPU for training without interruptions. Actual: The process halts immediately after displaying the user warning:

UserWarning: You passed a model_id to the DPOTrainer. This will automatically create an `AutoModelForCausalLM` or a `PeftModel` (if you passed a `peft_config`) for you.

Post this warning, there's no progression.

huggingface / alignment-handbook

Multi-GPU Training with DPO Full Parameter Stucks #147