Open rucnyz opened 11 months ago
I was freezing my input embeddings the same way as you, using deepspeed2 and the resulting weights can't be read back in, maybe related?
for param in emb.parameters():
param.requires_grad = False
And getting the same problem, where it can't re load the weights because of a missing emb.weight
I've dropped a breakpoint()
here: https://github.com/microsoft/DeepSpeed/blob/master/deepspeed/utils/zero_to_fp32.py#L105
and observed this:
(Pdb) [x for x in state_dict['module'] if 'emb' in x]
['_forward_module.emb.weight']
(Pdb) [x for x in state_dict[PARAM_SHAPES] if 'emb' in x]
[]
(Pdb) state_dict[FROZEN_PARAM_SHAPES]
None
So they're in the state_dict
, but not in the state_dict[FROZEN_PARAM_SHAPES]
.
This is as far as I've been able to debug, hopefully this helps more debuggage.
edit: i've also confirmed that the only place in the entire state_dict
that my emb
shows up at is under 'module'
:
{'module': OrderedDict([('_forward_module.emb.weight', tensor([[ ...]])]), ...}
The repro script exists with the error
Describe the bug This bug is similar to #4055 , I provide a repro here.
To Reproduce Please put these three files in the same directory (remember to change the first two
.txt -> .py
anddeepspeed_config.txt -> deepspeed_config.yaml
), and reproduce the result with:train_test.txt utils.txt deepspeed_config.txt
Currently, the code runs fine, but if I uncomment these three lines (147 to 149 in the file
train_test.py
), the code will throw an error as follows:errors:
System info (please complete the following information):
Launcher context accelerate launch