Using distributed or parallel set-up in script?: Yes
Using GPU in script?: Yes
GPU type: Tesla V100-SXM2-32GB
Who can help?
@muellerzr
The issue arises when the script is launched with deepspeed. It seems that the model is not loaded in GPU when create_optimizer is called and thus fails in creating an optimizer.
However, setting deepspeed_dict=None and using the same launch command does not cause any error, and training continues as usual. So, I am guessing it could be caused by conflicting deepspeed settings or incorrect parsing of deepspeed settings.
Information
[x] The official example scripts
[x] My own modified scripts
Tasks
[x] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
System Info
transformers
version: 4.42.3Who can help?
@muellerzr
The issue arises when the script is launched with
deepspeed
. It seems that the model is not loaded in GPU whencreate_optimizer
is called and thus fails in creating an optimizer.Launch command
Output:
However, setting
deepspeed_dict=None
and using the same launch command does not cause any error, and training continues as usual. So, I am guessing it could be caused by conflictingdeepspeed
settings or incorrect parsing of deepspeed settings.Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Expected behavior
Training should be completed.