huggingface / accelerate

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
https://huggingface.co/docs/accelerate
Apache License 2.0
7.92k stars 966 forks source link

"bfloat16.enabled" needed be specified when training T5 #2469

Closed HCHCXY closed 7 months ago

HCHCXY commented 8 months ago

I met following situation when training T5. "ValueError: bfloat16.enabled not found in kwargs. Please specify bfloat16.enabled without auto(set to correct value) in the DeepSpeed config file or pass it in kwargs."

I use transfromers==4.28.0 and accelerate===0.20.3

the variable trainer has type "transformers.trainer_seq2seq.Seq2SeqTrainer". I don't get how to pass in configurations about bfloat16 in trainer.train method. Could anyone helps?

The information are listed below:

File "./ds_train.py", line 378, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/home/hechenghua/anaconda3/envs/swiftsage/lib/python3.8/site-packages/transformers/trainer.py", line 1539, in train
return inner_training_loop(
File "/home/hechenghua/anaconda3/envs/swiftsage/lib/python3.8/site-packages/transformers/trainer.py", line 1659, in _inner_training_loop
model, self.optimizer, self.lr_scheduler = self.accelerator.prepare(
File "/home/hechenghua/anaconda3/envs/swiftsage/lib/python3.8/site-packages/accelerate/accelerator.py", line 1178, in prepare
result = self._prepare_deepspeed(*args)
File "/home/hechenghua/anaconda3/envs/swiftsage/lib/python3.8/site-packages/accelerate/accelerator.py", line 1486, in _prepare_deepspeed
deepspeed_plugin.deepspeed_config_process(must_match=False, **config_kwargs)
File "/home/hechenghua/anaconda3/envs/swiftsage/lib/python3.8/site-packages/accelerate/utils/dataclasses.py", line 624, in deepspeed_config_process
self.deepspeed_config_process(
File "/home/hechenghua/anaconda3/envs/swiftsage/lib/python3.8/site-packages/accelerate/utils/dataclasses.py", line 628, in deepspeed_config_process
self.fill_match(prefix + key, mismatches, must_match=must_match, **kwargs)
File "/home/hechenghua/anaconda3/envs/swiftsage/lib/python3.8/site-packages/accelerate/utils/dataclasses.py", line 603, in fill_match
raise ValueError(
ValueError: bfloat16.enabled not found in kwargs. Please specify bfloat16.enabled without auto(set to correct value) in the DeepSpeed config file or pass it in kwargs.
SunMarc commented 8 months ago

cc @pacman100

github-actions[bot] commented 7 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.