AttributeError: 'HfDeepSpeedConfig' object has no attribute 'trainer_config_finalize'

zhongshsh commented 8 months ago

System Info

transformers version: 4.30.0
Platform: Linux-5.15.0-18-shopee-generic-x86_64-with-glibc2.31
Python version: 3.10.13
Huggingface_hub version: 0.17.3
Safetensors version: 0.4.1
PyTorch version (GPU?): 2.0.1 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?:
Using distributed or parallel set-up in script?:

Who can help?

No response

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[X] My own task or dataset (give details below)

Reproduction

run the repo https://github.com/stanleylsx/llms_tool in the mode of rm_train

read https://github.com/stanleylsx/llms_tool?tab=readme-ov-file#rm-training and modify config.py
read https://github.com/stanleylsx/llms_tool?tab=readme-ov-file#deepspeed and modify config.py
run deepspeed --num_gpus 2 --master_port=9999 main.py

Then getting error

Traceback (most recent call last):
  File "llms_tool/main.py", line 34, in <module>
    train.train_reward_model()
  File "llms_tool/engines/train.py", line 309, in train_reward_model. # https://github.com/stanleylsx/llms_tool/blob/main/engines/train.py#L309
    train_result = trainer.train()
  File "xxx/miniconda3/envs/py310/lib/python3.10/site-packages/transformers/trainer.py", line 1645, in train
    return inner_training_loop(
  File "xxx/miniconda3/envs/py310/lib/python3.10/site-packages/transformers/trainer.py", line 1725, in _inner_training_loop
    self.optimizer, self.lr_scheduler = deepspeed_init(self, num_training_steps=max_steps)
  File "xxx/miniconda3/envs/py310/lib/python3.10/site-packages/transformers/deepspeed.py", line 344, in deepspeed_init # https://github.com/huggingface/transformers/blob/main/src/transformers/integrations/deepspeed.py#L355
    hf_deepspeed_config.trainer_config_finalize(args, model, num_training_steps)
AttributeError: 'HfDeepSpeedConfig' object has no attribute 'trainer_config_finalize'

Expected behavior

not report error

ArthurZucker commented 8 months ago

Hey thanks for reporting, can you upgrade to the newest version of transformers 🤗

zhongshsh commented 8 months ago

thx for your reply. After using pip install -U transformers or using conda upgrade transformers, same error still exists. here is the version info after the upgrade:

- `transformers` version: 4.36.2
- Platform: Linux-5.15.0-18-shopee-generic-x86_64-with-glibc2.31
- Python version: 3.10.13
- Huggingface_hub version: 0.20.2
- Safetensors version: 0.4.0
- Accelerate version: 0.22.0
- Accelerate config:    not found
- PyTorch version (GPU?): 2.0.1 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: <fill in>

ArthurZucker commented 8 months ago

The config class used seems to be HFDeepSpeedConfig vs HfTrainerDeepSpeedConfig (which inherits from the latter.)

github-actions[bot] commented 7 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

huggingface / transformers