Open shadow150519 opened 1 month ago
I find a similar issue #2736 in deepspeed and #94907 in pytorch
I faced similar issues when I upgraded the transformers package from 4.28.0 to 4.44.2 (for some reason). Once I reverted back and kept torch==2.2.1, lightning==2.2.1, accelerate==0.27.2 and deepspeed==0.14.0 the error went away. As to why it works like that, it is anyone's guess.
Note that I am using python 3.11
Describe the bug I'm running the BingBertSquad example in
DeepSpeedExamples/training/BingBertSquad
, I use thegoogle-bert/bert-large-uncased
model from hugging face. I usebash run_squad_deepspeed.sh 4 ckpt/bert_large_uncased/pytorch_model.bin /dataset /output
to finetune model. In order to use HF ckpt, I slightly change therun_squad_deepspeed.sh
however, I end up with the following error:
To Reproduce Steps to reproduce the behavior: 1.download hf bert-large-uncased model and put into ckpt/bert_large_uncased folders 2.run
bash run_squad_deepspeed.sh 4 ckpt/bert_large_uncased/pytorch_model.bin /dataset /output
Expected behavior
ds_report output Please run
ds_report
to give us details about your setup.System info (please complete the following information):