huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
132.3k stars 26.35k forks source link

AttributeError: 'HfDeepSpeedConfig' object has no attribute 'trainer_config_finalize' #28395

Closed zhongshsh closed 7 months ago

zhongshsh commented 8 months ago

System Info

Who can help?

No response

Information

Tasks

Reproduction

run the repo https://github.com/stanleylsx/llms_tool in the mode of rm_train

  1. read https://github.com/stanleylsx/llms_tool?tab=readme-ov-file#rm-training and modify config.py
  2. read https://github.com/stanleylsx/llms_tool?tab=readme-ov-file#deepspeed and modify config.py
  3. run deepspeed --num_gpus 2 --master_port=9999 main.py

Then getting error

Traceback (most recent call last):
  File "llms_tool/main.py", line 34, in <module>
    train.train_reward_model()
  File "llms_tool/engines/train.py", line 309, in train_reward_model. # https://github.com/stanleylsx/llms_tool/blob/main/engines/train.py#L309
    train_result = trainer.train()
  File "xxx/miniconda3/envs/py310/lib/python3.10/site-packages/transformers/trainer.py", line 1645, in train
    return inner_training_loop(
  File "xxx/miniconda3/envs/py310/lib/python3.10/site-packages/transformers/trainer.py", line 1725, in _inner_training_loop
    self.optimizer, self.lr_scheduler = deepspeed_init(self, num_training_steps=max_steps)
  File "xxx/miniconda3/envs/py310/lib/python3.10/site-packages/transformers/deepspeed.py", line 344, in deepspeed_init # https://github.com/huggingface/transformers/blob/main/src/transformers/integrations/deepspeed.py#L355
    hf_deepspeed_config.trainer_config_finalize(args, model, num_training_steps)
AttributeError: 'HfDeepSpeedConfig' object has no attribute 'trainer_config_finalize'

Expected behavior

not report error

ArthurZucker commented 8 months ago

Hey thanks for reporting, can you upgrade to the newest version of transformers 🤗

zhongshsh commented 8 months ago

thx for your reply. After using pip install -U transformers or using conda upgrade transformers, same error still exists. here is the version info after the upgrade:

- `transformers` version: 4.36.2
- Platform: Linux-5.15.0-18-shopee-generic-x86_64-with-glibc2.31
- Python version: 3.10.13
- Huggingface_hub version: 0.20.2
- Safetensors version: 0.4.0
- Accelerate version: 0.22.0
- Accelerate config:    not found
- PyTorch version (GPU?): 2.0.1 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: <fill in>
ArthurZucker commented 8 months ago

The config class used seems to be HFDeepSpeedConfig vs HfTrainerDeepSpeedConfig (which inherits from the latter.)

github-actions[bot] commented 7 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.