huggingface / trl

Train transformer language models with reinforcement learning.
http://hf.co/docs/trl
Apache License 2.0
10.02k stars 1.27k forks source link

RLooTrainer bug when using deepspeed #2329

Open macheng6 opened 1 week ago

macheng6 commented 1 week ago

System Info

When using DeepSpeed, the RLOOTrainer reports an error: "ValueError: Please make sure to properly initialize your accelerator via accelerator = Accelerator() before using any functionality from the accelerate library." This is likely due to the accelerate not being properly initialized in line 120 of the RLOOTrainer code, possibly because the deepspeed_plugin was not passed in.

trl: 0.12.0.dev0 transformers: 4.45.2 accelerate: 1.0.1

Information

Tasks

Reproduction

accelerate launch --config_file examples/accelerate_configs/deepspeed_zero3.yaml \ examples/scripts/rloo/rloo.py \ --dataset_name trl-internal-testing/descriptiveness-sentiment-trl-style \ --dataset_train_split descriptiveness \ --output_dir models/minimal/rloo \ --rloo_k 2 \ --num_ppo_epochs 1 \ --num_mini_batches 1 \ --learning_rate 3e-6 \ --per_device_train_batch_size 1 \ --gradient_accumulation_steps 16 \ --total_episodes 10000 \ --model_name_or_path EleutherAI/pythia-1b-deduped \ --sft_model_path EleutherAI/pythia-1b-deduped \ --reward_model_path EleutherAI/pythia-1b-deduped \ --local_rollout_forward_batch_size 1 \ --missing_eos_penalty 1.0

Expected behavior

This is likely due to the accelerate not being properly initialized in line 120 of the RLOOTrainer code, possibly because the deepspeed_plugin was not passed in.

naskimed commented 5 days ago

Hey, I have the same issue using PPOTrainer: "ValueError: Please make sure to properly initialize your accelerator via accelerator = Accelerator() before using any functionality from the accelerate library".

trl: 0.13.0.dev0 transformers: 4.46.2 accelerate: 1.1.0.dev0 Screenshot from 2024-11-07 17-24-25

KAKSIS commented 4 days ago

Hey, I have the same issue using PPOTrainer: "ValueError: Please make sure to properly initialize your accelerator via accelerator = Accelerator() before using any functionality from the accelerate library".

trl: 0.13.0.dev0 transformers: 4.46.2 accelerate: 1.1.0.dev0 Screenshot from 2024-11-07 17-24-25

I have the same problem

kongjiellx commented 4 days ago

+1 with PPOTrainer

macheng6 commented 1 day ago

After using the version configuration below, the code can be run: trl==0.11.4 accelerate==0.33.0,