huggingface / trl

Train transformer language models with reinforcement learning.
http://hf.co/docs/trl
Apache License 2.0
9.55k stars 1.19k forks source link

Policy has no attribute 'zero_gather_16bit_weights_on_model_save' #2122

Closed Constructor-Sun closed 2 days ago

Constructor-Sun commented 3 days ago

System Info

Information

Tasks

Reproduction

When I tried to use ppo-v2 for my own task, I met the following error in the checkpoint saving:

 50%|█████     | 1/2 [02:29<01:15, 75.62s/it]
100%|██████████| 2/2 [02:29<00:00, 74.59s/it]Traceback (most recent call last):
[rank0]: Traceback (most recent call last):
[rank0]:   File "/mnt/data2/haiying/my_project/train/ppo2_train.py", line 190, in <module>
[rank0]:     trainer.train()
[rank0]:   File "/home/haiying/anaconda3/envs/my_project/lib/python3.10/site-packages/trl/trainer/ppov2_trainer.py", line 555, in train
[rank0]:     self._save_checkpoint(model, trial=None, metrics=metrics)
[rank0]:   File "/home/haiying/anaconda3/envs/my_project/lib/python3.10/site-packages/transformers/trainer.py", line 2886, in _save_checkpoint
[rank0]:     self.save_model(output_dir, _internal_call=True)
[rank0]:   File "/home/haiying/anaconda3/envs/my_project/lib/python3.10/site-packages/trl/trainer/ppov2_trainer.py", line 251, in save_model
[rank0]:     super().save_model(output_dir, _internal_call)
[rank0]:   File "/home/haiying/anaconda3/envs/my_project/lib/python3.10/site-packages/transformers/trainer.py", line 3439, in save_model
[rank0]:     state_dict = self.accelerator.get_state_dict(self.deepspeed)
[rank0]:   File "/home/haiying/anaconda3/envs/my_project/lib/python3.10/site-packages/accelerate/accelerator.py", line 3320, in get_state_dict
[rank0]:     if model.zero_gather_16bit_weights_on_model_save():
[rank0]:   File "/home/haiying/anaconda3/envs/my_project/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1709, in __getattr__
[rank0]:     raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
[rank0]: AttributeError: 'MistralForCausalLM' object has no attribute 'zero_gather_16bit_weights_on_model_save'

Here MistralForCausalLM is my policy.

I used the following command to run examples/scripts/ppo/ppo_tldr.py file (in which I only modified the dataset loading method and did not make any other changes):

accelerate launch --config_file train/ppo_acc_config.yaml \
    train/ppo_tldr.py  \
    --output_dir models/CLM-ppov2 \
    --model_name_or_path models/sft-model-7b \
    --sft_model_path models/sft-model-7b \
    --reward_model_path models/critic-model-2b \
    --dataset inventory_train/rl_data.json \
    --trust_remote_code true \
    --learning_rate 3e-6 \
    --max_length 1024 \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 1 \
    --total_episodes 10 \
    --local_rollout_forward_batch_size 2 \
    --missing_eos_penalty 1.0 \
    --stop_token eos

(Here I set total_episodes to 10 just to save time since errors occur at the checkpoint saving stage.) The accelerate config file is:

compute_environment: LOCAL_MACHINE
debug: false
deepspeed_config:
  deepspeed_multinode_launcher: standard
  offload_optimizer_device: cpu
  offload_param_device: cpu
  zero3_init_flag: false
  zero3_save_16bit_model: true
  zero_stage: 3
distributed_type: DEEPSPEED
downcast_bf16: 'no'
machine_rank: 0
main_training_function: main
mixed_precision: bf16
num_machines: 1
num_processes: 4
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false

It seems that this error occurs then I use zero-3 and mixed precision bf16. I tried to switch to zero-2 but only got out-of-memory error. Meanwhile, I find out that zero_gather_16bit_weights_on_model_save is a method in DeepSpeedEngine class. Could someone help me solve or analyze this issue? I have been struggling with it for several days.😭😭

Expected behavior

PPOv2 trainer can save checkpoints when using zero-3.

bartoszzuk commented 3 days ago

Hey @Constructor-Sun, I encountered a similar issue using PPOv2Trainer. Here is what I think is going on. In PPOv2Trainer the policy and value models are wrapped in PolicyAndValueWrapper object and it is that object that is initialized by DeepSpeed. However in save_model method of PPOv2Trainer we temporarily overwrite the self.deepspeed attribute with policy model, which itself is not a DeepSpeedEngine object and therefore we get the AttributeError.

https://github.com/huggingface/trl/blob/32d9d34eb15ce9803f571c1ebd0875d2c7bef82c/trl/trainer/ppov2_trainer.py#L243-L256

I've come up with a fix that seems to work. 1) Firstly we need to change the save_model method to not replace the self.deepspeed attribute. This way the save_model method from HF trainer will use the DeepSpeedEngine object to gather weights from all processes for both the value and the policy model. 2) Then we need to override the _save method to extract only the policy model weights (when deepspeed is enabled) and call the actual _save method passing the modified state_dict.

class FixZero3CheckpointPPOv2Trainer(PPOv2Trainer):

    def save_model(self, output_dir: Optional[str] = None, _internal_call: bool = False):
        backup_model = self.model
        self.model = self.model.policy  # save only the policy

        Trainer.save_model(self, output_dir, _internal_call)

        self.model = backup_model

    def _save(self, output_dir: Optional[str] = None, state_dict=None):
        if self.is_deepspeed_enabled:
            state_dict = {name.removeprefix('policy.'): param for name, param in state_dict.items()
                          if name.startswith('policy.')}

        super()._save(output_dir, state_dict)

Let me know if this solution works with your setup. I would make a pull request with this fix but it feels a bit hacky. Maybe someone will come up with a more elegant solution :)

Constructor-Sun commented 2 days ago

Hey @Constructor-Sun, I encountered a similar issue using PPOv2Trainer. Here is what I think is going on. In PPOv2Trainer the policy and value models are wrapped in PolicyAndValueWrapper object and it is that object that is initialized by DeepSpeed. However in save_model method of PPOv2Trainer we temporarily overwrite the self.deepspeed attribute with policy model, which itself is not a DeepSpeedEngine object and therefore we get the AttributeError.

https://github.com/huggingface/trl/blob/32d9d34eb15ce9803f571c1ebd0875d2c7bef82c/trl/trainer/ppov2_trainer.py#L243-L256

I've come up with a fix that seems to work.

  1. Firstly we need to change the save_model method to not replace the self.deepspeed attribute. This way the save_model method from HF trainer will use the DeepSpeedEngine object to gather weights from all processes for both the value and the policy model.
  2. Then we need to override the _save method to extract only the policy model weights (when deepspeed is enabled) and call the actual _save method passing the modified state_dict.
class FixZero3CheckpointPPOv2Trainer(PPOv2Trainer):

    def save_model(self, output_dir: Optional[str] = None, _internal_call: bool = False):
        backup_model = self.model
        self.model = self.model.policy  # save only the policy

        Trainer.save_model(self, output_dir, _internal_call)

        self.model = backup_model

    def _save(self, output_dir: Optional[str] = None, state_dict=None):
        if self.is_deepspeed_enabled:
            state_dict = {name.removeprefix('policy.'): param for name, param in state_dict.items()
                          if name.startswith('policy.')}

        super()._save(output_dir, state_dict)

Let me know if this solution works with your setup. I would make a pull request with this fix but it feels a bit hacky. Maybe someone will come up with a more elegant solution :)

Thanks for your help! I also found out that the bug happens because the policy isn't a DeepSpeedEngine object. The policy and value model are bundled together in PolicyAndValueWrapper and then wrapped into  DeepSpeedEngine using accelerate.prepare().

I have tested your solution, and it works well!

My solution to this is to rewrite save_model  as:

def save_model(self, output_dir: Optional[str] = None, _internal_call: bool = False):
        backup_model = self.model
        self.model = self.model.policy  # save only the policy

        if self.is_deepspeed_enabled:
            backup_deepspeed = self.deepspeed
            self.deepspeed = self.model

        os.makedirs(output_dir, exist_ok=True)
        self.model.save_pretrained(output_dir)

        self.model = backup_model

        if self.is_deepspeed_enabled:
            self.deepspeed = backup_deepspeed

But I didn't check if this was correct.

Thanks a lot, once more!