Open ojh31 opened 2 months ago
Transferring to transformers as this is really a transformers issue.
Things I need:
Versions of not just accelerate but transformers
as well, and can you try updating to their latest?
Transferring to transformers as this is really a transformers issue.
Things I need:
Versions of not just accelerate but
transformers
as well, and can you try updating to their latest?
I'm on latest transformers already (4.42.4) and accelerate=0.29.2
Can you try the latest accelerate as well
Can you try the latest accelerate as well
Same behavior with accelerate==0.32.1
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Thanks for your issue @ojh31!
This is indeed an issue but isn't related to FSDP per-se. It's relate d to this unsafe code path in which we're replacing the existing state_dict
with None
in order to load it differently:
This isn't safe as we can see here. @SunMarc, would you have the bandwidth to take a look at this issue?
System Info
Information
Tasks
no_trainer
script in theexamples
folder of thetransformers
repo (such asrun_no_trainer_glue.py
)Reproduction
The following code successfully loads the model checkpoint if ran using
python foo.py
oraccelerate launch foo.py
but not with FSDP enabledaccelerate launch --use_fsdp foo.py
.This seems like a bug where we say in
PreTrainedModel.from_pretrained
thatpretrained_model_name_or_path
can be None "if you are both providing the configuration and state dictionary", which I do here. But then ifis_fsdp_enabled()
is True, we setlow_cpu_mem_usage = True
and thus in turnstate_dict = None
, which causes the loading to fail.Error message:
Expected behavior
Should output
<class 'transformers.models.bert.modeling_bert.BertForSequenceClassification'>