sahil280114 / codealpaca

Apache License 2.0
1.42k stars 108 forks source link

bug: get empty state dict #8

Open Anditty opened 1 year ago

Anditty commented 1 year ago

I follow the step in README, but I get the empty state dict. Here is the code and the output: code:

trainer = Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module)
print(trainer.model.state_dict()['model.layers.30.mlp.gate_proj.weight'])
print('training')
trainer.train()
print(trainer.model.state_dict()['model.layers.30.mlp.gate_proj.weight'])
print('trained')
trainer.save_state()
print(trainer.model.state_dict()['model.layers.30.mlp.gate_proj.weight'])
print('saved')

output: tensor([[ 1.5984e-03, -1.6602e-02, -1.6460e-03, ..., -1.6632e-02, -1.9989e-02, 1.1383e-02], [ 9.5062e-03, 3.3356e-02, 5.6343e-03, ..., -3.6743e-02, -3.2074e-02, 2.6810e-02], [ 1.1917e-02, -2.1515e-02, -2.6352e-02, ..., 2.7328e-02, -4.0550e-03, 1.5320e-02], ..., [-2.8503e-02, 1.5316e-03, -1.8753e-02, ..., 2.9846e-02, -1.9440e-02, 2.6703e-02], [ 5.6505e-05, -4.5898e-02, 2.0660e-02, ..., -6.5689e-03, -3.2043e-02, 1.8005e-02], [-7.1106e-03, -7.1487e-03, -4.5624e-03, ..., 1.3138e-02, -4.3060e-02, -1.5869e-02]]) training tensor([], device='cuda:0', dtype=torch.float16) trained tensor([], device='cuda:0', dtype=torch.float16) saved

Anditty commented 1 year ago

I found if I use --deepspeed ds_config.json option, then print(trainer.model.state_dict()['model.layers.30.mlp.gate_proj.weight']) will print tensor([], device='cuda:0', dtype=torch.float16). And It is mentioned in the README.md that FSDP full_shard mode is used, but FSDP and deepspeed should not be used at the same time.