hpcaitech / ColossalAI-Examples

Examples of training models with hybrid parallelism using ColossalAI
Apache License 2.0
334 stars 102 forks source link

Problem with saving model state dict #156

Closed ouyangliqi closed 1 year ago

ouyangliqi commented 1 year ago

🐛 Describe the bug

https://github.com/hpcaitech/ColossalAI-Examples/blob/f743872c2089d6bb5e593db6a8a48d427e6b2b1e/language/opt/run_clm.py#L504

The code in this line should be model_state = model.state_dict(), although fixing this bug, the saved state dict is all None.

Traceback (most recent call last): File "generate.py", line 238, in <module> main() File "generate.py", line 211, in main model = OPTForCausalLM.from_pretrained(args.model_path) File "/mnt/datadisk0/ouyangliqi/miniconda3/envs/colossalai/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2119, in from_pretrained model, missing_keys, unexpected_keys, mismatched_keys, error_msgs = cls._load_pretrained_model( File "/mnt/datadisk0/ouyangliqi/miniconda3/envs/colossalai/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2376, in _load_pretrained_model raise RuntimeError(f"Error(s) in loading state_dict for {model.__class__.__name__}:\n\t{error_msg}") RuntimeError: Error(s) in loading state_dict for OPTForCausalLM: size mismatch for model.decoder.embed_tokens.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([50272, 4096]). size mismatch for model.decoder.embed_positions.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([2050, 4096]). size mismatch for model.decoder.final_layer_norm.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for model.decoder.final_layer_norm.bias: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for model.decoder.layers.0.self_attn.k_proj.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4096, 4096]). size mismatch for model.decoder.layers.0.self_attn.k_proj.bias: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for model.decoder.layers.0.self_attn.v_proj.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4096, 4096]). size mismatch for model.decoder.layers.0.self_attn.v_proj.bias: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for model.decoder.layers.0.self_attn.q_proj.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4096, 4096]). size mismatch for model.decoder.layers.0.self_attn.q_proj.bias: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for model.decoder.layers.0.self_attn.out_proj.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4096, 4096]). size mismatch for model.decoder.layers.0.self_attn.out_proj.bias: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for model.decoder.layers.0.self_attn_layer_norm.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for model.decoder.layers.0.self_attn_layer_norm.bias: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for model.decoder.layers.0.fc1.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([16384, 4096]). size mismatch for model.decoder.layers.0.fc1.bias: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([16384]). size mismatch for model.decoder.layers.0.fc2.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4096, 16384]). size mismatch for model.decoder.layers.0.fc2.bias: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4096]). size mismatch for model.decoder.layers.0.final_layer_norm.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([4096])....

Environment

CUDA: 11.3 Pytorch: 1.12 transformers: 4.21.0.dev0

FrankLeeeee commented 1 year ago

Thanks for spotting this. Give me some time to reproduce this case.

feifeibear commented 1 year ago

I think the issue has been fixed after communication offline.