Closed zhengmk321 closed 1 year ago
HI, I figured out the issue above. Turns out that I indeed installed the wrong version of energonai. After installing the correct one by following the instructions in the README.md, I tried hosting OPT_30B model, but I got these error messages:
Process SpawnProcess-1:
Process SpawnProcess-2:
Traceback (most recent call last):
File "/work/09308/zhengmk/miniconda3/envs/colossal/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/work/09308/zhengmk/miniconda3/envs/colossal/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/work/09308/zhengmk/miniconda3/envs/colossal/lib/python3.8/site-packages/energonai/worker.py", line 32, in __init__
self.model: nn.Module = model_fn(**model_kwargs).cuda()
File "/work/09308/zhengmk/miniconda3/envs/colossal/lib/python3.8/site-packages/energonai/model/model_factory.py", line 323, in opt_30B
return create_pipeline_model(**model_kwargs)
File "/work/09308/zhengmk/miniconda3/envs/colossal/lib/python3.8/site-packages/energonai/model/model_factory.py", line 223, in create_pipeline_model
load_checkpoint(model_kwargs["checkpoint"], model, preprocess_fn=preprocess_fn, **model_kwargs)
File "/work/09308/zhengmk/miniconda3/envs/colossal/lib/python3.8/site-packages/energonai/utils/checkpointing.py", line 95, in load_checkpoint
model.load_state_dict(model_state, strict=strict)
File "/work/09308/zhengmk/miniconda3/envs/colossal/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1604, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for PipelineModel:
size mismatch for blocks.0.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.0.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.0.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.0.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.1.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.1.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.1.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.1.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.2.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.2.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.2.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.2.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.3.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.3.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.3.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.3.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.4.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.4.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.4.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.4.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.5.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.5.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.5.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.5.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.6.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.6.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.6.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.6.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.7.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.7.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.7.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.7.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.8.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.8.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.8.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.8.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.9.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.9.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.9.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.9.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.10.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.10.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.10.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.10.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.11.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.11.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.11.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.11.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.12.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.12.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.12.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.12.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.13.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.13.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.13.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.13.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.14.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.14.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.14.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.14.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.15.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.15.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.15.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.15.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.16.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.16.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.16.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.16.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.17.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.17.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.17.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.17.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.18.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.18.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.18.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.18.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.19.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.19.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.19.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.19.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.20.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.20.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.20.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.20.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.21.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.21.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.21.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.21.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.22.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.22.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.22.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.22.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.23.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.23.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.23.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.23.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.24.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.24.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.24.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.24.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.25.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.25.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.25.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.25.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.26.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.26.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.26.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.26.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.27.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.27.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.27.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.27.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.28.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.28.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.28.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.28.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.29.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.29.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.29.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.29.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.30.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.30.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.30.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.30.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.31.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.31.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.31.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.31.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.32.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.32.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.32.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.32.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.33.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.33.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.33.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.33.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.34.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.34.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.34.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.34.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.35.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.35.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.35.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.35.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.36.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.36.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.36.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.36.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.37.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.37.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.37.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.37.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.38.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.38.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.38.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.38.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.39.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.39.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.39.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.39.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.40.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.40.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.40.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.40.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.41.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.41.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.41.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.41.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.42.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.42.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.42.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.42.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.43.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.43.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.43.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.43.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.44.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.44.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.44.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.44.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.45.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.45.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.45.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.45.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.46.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.46.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.46.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.46.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.47.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.47.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.47.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.47.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for head.dense.weight: copying a param with shape torch.Size([12568, 3584]) from checkpoint, the shape in current model is torch.Size([50272, 3584]).
Traceback (most recent call last):
File "/work/09308/zhengmk/miniconda3/envs/colossal/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/work/09308/zhengmk/miniconda3/envs/colossal/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/work/09308/zhengmk/miniconda3/envs/colossal/lib/python3.8/site-packages/energonai/worker.py", line 32, in __init__
self.model: nn.Module = model_fn(**model_kwargs).cuda()
File "/work/09308/zhengmk/miniconda3/envs/colossal/lib/python3.8/site-packages/energonai/model/model_factory.py", line 323, in opt_30B
return create_pipeline_model(**model_kwargs)
File "/work/09308/zhengmk/miniconda3/envs/colossal/lib/python3.8/site-packages/energonai/model/model_factory.py", line 223, in create_pipeline_model
load_checkpoint(model_kwargs["checkpoint"], model, preprocess_fn=preprocess_fn, **model_kwargs)
File "/work/09308/zhengmk/miniconda3/envs/colossal/lib/python3.8/site-packages/energonai/utils/checkpointing.py", line 95, in load_checkpoint
model.load_state_dict(model_state, strict=strict)
File "/work/09308/zhengmk/miniconda3/envs/colossal/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1604, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for PipelineModel:
Missing key(s) in state_dict: "blocks.0.attn.query_.weight", "blocks.0.attn.query_.bias", "blocks.0.attn.key_.weight", "blocks.0.attn.key_.bias", "blocks.0.attn.value_.weight", "blocks.0.attn.value_.bias", "blocks.1.attn.query_.weight", "blocks.1.attn.query_.bias", "blocks.1.attn.key_.weight", "blocks.1.attn.key_.bias", "blocks.1.attn.value_.weight", "blocks.1.attn.value_.bias", "blocks.2.attn.query_.weight", "blocks.2.attn.query_.bias", "blocks.2.attn.key_.weight", "blocks.2.attn.key_.bias", "blocks.2.attn.value_.weight", "blocks.2.attn.value_.bias", "blocks.3.attn.query_.weight", "blocks.3.attn.query_.bias", "blocks.3.attn.key_.weight", "blocks.3.attn.key_.bias", "blocks.3.attn.value_.weight", "blocks.3.attn.value_.bias", "blocks.4.attn.query_.weight", "blocks.4.attn.query_.bias", "blocks.4.attn.key_.weight", "blocks.4.attn.key_.bias", "blocks.4.attn.value_.weight", "blocks.4.attn.value_.bias", "blocks.5.attn.query_.weight", "blocks.5.attn.query_.bias", "blocks.5.attn.key_.weight", "blocks.5.attn.key_.bias", "blocks.5.attn.value_.weight", "blocks.5.attn.value_.bias", "blocks.6.attn.query_.weight", "blocks.6.attn.query_.bias", "blocks.6.attn.key_.weight", "blocks.6.attn.key_.bias", "blocks.6.attn.value_.weight", "blocks.6.attn.value_.bias", "blocks.7.attn.query_.weight", "blocks.7.attn.query_.bias", "blocks.7.attn.key_.weight", "blocks.7.attn.key_.bias", "blocks.7.attn.value_.weight", "blocks.7.attn.value_.bias", "blocks.8.attn.query_.weight", "blocks.8.attn.query_.bias", "blocks.8.attn.key_.weight", "blocks.8.attn.key_.bias", "blocks.8.attn.value_.weight", "blocks.8.attn.value_.bias", "blocks.9.attn.query_.weight", "blocks.9.attn.query_.bias", "blocks.9.attn.key_.weight", "blocks.9.attn.key_.bias", "blocks.9.attn.value_.weight", "blocks.9.attn.value_.bias", "blocks.10.attn.query_.weight", "blocks.10.attn.query_.bias", "blocks.10.attn.key_.weight", "blocks.10.attn.key_.bias", "blocks.10.attn.value_.weight", "blocks.10.attn.value_.bias", "blocks.11.attn.query_.weight", "blocks.11.attn.query_.bias", "blocks.11.attn.key_.weight", "blocks.11.attn.key_.bias", "blocks.11.attn.value_.weight", "blocks.11.attn.value_.bias", "blocks.12.attn.query_.weight", "blocks.12.attn.query_.bias", "blocks.12.attn.key_.weight", "blocks.12.attn.key_.bias", "blocks.12.attn.value_.weight", "blocks.12.attn.value_.bias", "blocks.13.attn.query_.weight", "blocks.13.attn.query_.bias", "blocks.13.attn.key_.weight", "blocks.13.attn.key_.bias", "blocks.13.attn.value_.weight", "blocks.13.attn.value_.bias", "blocks.14.attn.query_.weight", "blocks.14.attn.query_.bias", "blocks.14.attn.key_.weight", "blocks.14.attn.key_.bias", "blocks.14.attn.value_.weight", "blocks.14.attn.value_.bias", "blocks.15.attn.query_.weight", "blocks.15.attn.query_.bias", "blocks.15.attn.key_.weight", "blocks.15.attn.key_.bias", "blocks.15.attn.value_.weight", "blocks.15.attn.value_.bias", "blocks.16.attn.query_.weight", "blocks.16.attn.query_.bias", "blocks.16.attn.key_.weight", "blocks.16.attn.key_.bias", "blocks.16.attn.value_.weight", "blocks.16.attn.value_.bias", "blocks.17.attn.query_.weight", "blocks.17.attn.query_.bias", "blocks.17.attn.key_.weight", "blocks.17.attn.key_.bias", "blocks.17.attn.value_.weight", "blocks.17.attn.value_.bias", "blocks.18.attn.query_.weight", "blocks.18.attn.query_.bias", "blocks.18.attn.key_.weight", "blocks.18.attn.key_.bias", "blocks.18.attn.value_.weight", "blocks.18.attn.value_.bias", "blocks.19.attn.query_.weight", "blocks.19.attn.query_.bias", "blocks.19.attn.key_.weight", "blocks.19.attn.key_.bias", "blocks.19.attn.value_.weight", "blocks.19.attn.value_.bias", "blocks.20.attn.query_.weight", "blocks.20.attn.query_.bias", "blocks.20.attn.key_.weight", "blocks.20.attn.key_.bias", "blocks.20.attn.value_.weight", "blocks.20.attn.value_.bias", "blocks.21.attn.query_.weight", "blocks.21.attn.query_.bias", "blocks.21.attn.key_.weight", "blocks.21.attn.key_.bias", "blocks.21.attn.value_.weight", "blocks.21.attn.value_.bias", "blocks.22.attn.query_.weight", "blocks.22.attn.query_.bias", "blocks.22.attn.key_.weight", "blocks.22.attn.key_.bias", "blocks.22.attn.value_.weight", "blocks.22.attn.value_.bias", "blocks.23.attn.query_.weight", "blocks.23.attn.query_.bias", "blocks.23.attn.key_.weight", "blocks.23.attn.key_.bias", "blocks.23.attn.value_.weight", "blocks.23.attn.value_.bias", "blocks.24.attn.query_.weight", "blocks.24.attn.query_.bias", "blocks.24.attn.key_.weight", "blocks.24.attn.key_.bias", "blocks.24.attn.value_.weight", "blocks.24.attn.value_.bias", "blocks.25.attn.query_.weight", "blocks.25.attn.query_.bias", "blocks.25.attn.key_.weight", "blocks.25.attn.key_.bias", "blocks.25.attn.value_.weight", "blocks.25.attn.value_.bias", "blocks.26.attn.query_.weight", "blocks.26.attn.query_.bias", "blocks.26.attn.key_.weight", "blocks.26.attn.key_.bias", "blocks.26.attn.value_.weight", "blocks.26.attn.value_.bias", "blocks.27.attn.query_.weight", "blocks.27.attn.query_.bias", "blocks.27.attn.key_.weight", "blocks.27.attn.key_.bias", "blocks.27.attn.value_.weight", "blocks.27.attn.value_.bias", "blocks.28.attn.query_.weight", "blocks.28.attn.query_.bias", "blocks.28.attn.key_.weight", "blocks.28.attn.key_.bias", "blocks.28.attn.value_.weight", "blocks.28.attn.value_.bias", "blocks.29.attn.query_.weight", "blocks.29.attn.query_.bias", "blocks.29.attn.key_.weight", "blocks.29.attn.key_.bias", "blocks.29.attn.value_.weight", "blocks.29.attn.value_.bias", "blocks.30.attn.query_.weight", "blocks.30.attn.query_.bias", "blocks.30.attn.key_.weight", "blocks.30.attn.key_.bias", "blocks.30.attn.value_.weight", "blocks.30.attn.value_.bias", "blocks.31.attn.query_.weight", "blocks.31.attn.query_.bias", "blocks.31.attn.key_.weight", "blocks.31.attn.key_.bias", "blocks.31.attn.value_.weight", "blocks.31.attn.value_.bias", "blocks.32.attn.query_.weight", "blocks.32.attn.query_.bias", "blocks.32.attn.key_.weight", "blocks.32.attn.key_.bias", "blocks.32.attn.value_.weight", "blocks.32.attn.value_.bias", "blocks.33.attn.query_.weight", "blocks.33.attn.query_.bias", "blocks.33.attn.key_.weight", "blocks.33.attn.key_.bias", "blocks.33.attn.value_.weight", "blocks.33.attn.value_.bias", "blocks.34.attn.query_.weight", "blocks.34.attn.query_.bias", "blocks.34.attn.key_.weight", "blocks.34.attn.key_.bias", "blocks.34.attn.value_.weight", "blocks.34.attn.value_.bias", "blocks.35.attn.query_.weight", "blocks.35.attn.query_.bias", "blocks.35.attn.key_.weight", "blocks.35.attn.key_.bias", "blocks.35.attn.value_.weight", "blocks.35.attn.value_.bias", "blocks.36.attn.query_.weight", "blocks.36.attn.query_.bias", "blocks.36.attn.key_.weight", "blocks.36.attn.key_.bias", "blocks.36.attn.value_.weight", "blocks.36.attn.value_.bias", "blocks.37.attn.query_.weight", "blocks.37.attn.query_.bias", "blocks.37.attn.key_.weight", "blocks.37.attn.key_.bias", "blocks.37.attn.value_.weight", "blocks.37.attn.value_.bias", "blocks.38.attn.query_.weight", "blocks.38.attn.query_.bias", "blocks.38.attn.key_.weight", "blocks.38.attn.key_.bias", "blocks.38.attn.value_.weight", "blocks.38.attn.value_.bias", "blocks.39.attn.query_.weight", "blocks.39.attn.query_.bias", "blocks.39.attn.key_.weight", "blocks.39.attn.key_.bias", "blocks.39.attn.value_.weight", "blocks.39.attn.value_.bias", "blocks.40.attn.query_.weight", "blocks.40.attn.query_.bias", "blocks.40.attn.key_.weight", "blocks.40.attn.key_.bias", "blocks.40.attn.value_.weight", "blocks.40.attn.value_.bias", "blocks.41.attn.query_.weight", "blocks.41.attn.query_.bias", "blocks.41.attn.key_.weight", "blocks.41.attn.key_.bias", "blocks.41.attn.value_.weight", "blocks.41.attn.value_.bias", "blocks.42.attn.query_.weight", "blocks.42.attn.query_.bias", "blocks.42.attn.key_.weight", "blocks.42.attn.key_.bias", "blocks.42.attn.value_.weight", "blocks.42.attn.value_.bias", "blocks.43.attn.query_.weight", "blocks.43.attn.query_.bias", "blocks.43.attn.key_.weight", "blocks.43.attn.key_.bias", "blocks.43.attn.value_.weight", "blocks.43.attn.value_.bias", "blocks.44.attn.query_.weight", "blocks.44.attn.query_.bias", "blocks.44.attn.key_.weight", "blocks.44.attn.key_.bias", "blocks.44.attn.value_.weight", "blocks.44.attn.value_.bias", "blocks.45.attn.query_.weight", "blocks.45.attn.query_.bias", "blocks.45.attn.key_.weight", "blocks.45.attn.key_.bias", "blocks.45.attn.value_.weight", "blocks.45.attn.value_.bias", "blocks.46.attn.query_.weight", "blocks.46.attn.query_.bias", "blocks.46.attn.key_.weight", "blocks.46.attn.key_.bias", "blocks.46.attn.value_.weight", "blocks.46.attn.value_.bias", "blocks.47.attn.query_.weight", "blocks.47.attn.query_.bias", "blocks.47.attn.key_.weight", "blocks.47.attn.key_.bias", "blocks.47.attn.value_.weight", "blocks.47.attn.value_.bias".
Unexpected key(s) in state_dict: "blocks.0.self_attn.qkv_proj.weight", "blocks.0.self_attn.qkv_proj.bias", "blocks.1.self_attn.qkv_proj.weight", "blocks.1.self_attn.qkv_proj.bias", "blocks.2.self_attn.qkv_proj.weight", "blocks.2.self_attn.qkv_proj.bias", "blocks.3.self_attn.qkv_proj.weight", "blocks.3.self_attn.qkv_proj.bias", "blocks.4.self_attn.qkv_proj.weight", "blocks.4.self_attn.qkv_proj.bias", "blocks.5.self_attn.qkv_proj.weight", "blocks.5.self_attn.qkv_proj.bias", "blocks.6.self_attn.qkv_proj.weight", "blocks.6.self_attn.qkv_proj.bias", "blocks.7.self_attn.qkv_proj.weight", "blocks.7.self_attn.qkv_proj.bias", "blocks.8.self_attn.qkv_proj.weight", "blocks.8.self_attn.qkv_proj.bias", "blocks.9.self_attn.qkv_proj.weight", "blocks.9.self_attn.qkv_proj.bias", "blocks.10.self_attn.qkv_proj.weight", "blocks.10.self_attn.qkv_proj.bias", "blocks.11.self_attn.qkv_proj.weight", "blocks.11.self_attn.qkv_proj.bias", "blocks.12.self_attn.qkv_proj.weight", "blocks.12.self_attn.qkv_proj.bias", "blocks.13.self_attn.qkv_proj.weight", "blocks.13.self_attn.qkv_proj.bias", "blocks.14.self_attn.qkv_proj.weight", "blocks.14.self_attn.qkv_proj.bias", "blocks.15.self_attn.qkv_proj.weight", "blocks.15.self_attn.qkv_proj.bias", "blocks.16.self_attn.qkv_proj.weight", "blocks.16.self_attn.qkv_proj.bias", "blocks.17.self_attn.qkv_proj.weight", "blocks.17.self_attn.qkv_proj.bias", "blocks.18.self_attn.qkv_proj.weight", "blocks.18.self_attn.qkv_proj.bias", "blocks.19.self_attn.qkv_proj.weight", "blocks.19.self_attn.qkv_proj.bias", "blocks.20.self_attn.qkv_proj.weight", "blocks.20.self_attn.qkv_proj.bias", "blocks.21.self_attn.qkv_proj.weight", "blocks.21.self_attn.qkv_proj.bias", "blocks.22.self_attn.qkv_proj.weight", "blocks.22.self_attn.qkv_proj.bias", "blocks.23.self_attn.qkv_proj.weight", "blocks.23.self_attn.qkv_proj.bias", "blocks.24.self_attn.qkv_proj.weight", "blocks.24.self_attn.qkv_proj.bias", "blocks.25.self_attn.qkv_proj.weight", "blocks.25.self_attn.qkv_proj.bias", "blocks.26.self_attn.qkv_proj.weight", "blocks.26.self_attn.qkv_proj.bias", "blocks.27.self_attn.qkv_proj.weight", "blocks.27.self_attn.qkv_proj.bias", "blocks.28.self_attn.qkv_proj.weight", "blocks.28.self_attn.qkv_proj.bias", "blocks.29.self_attn.qkv_proj.weight", "blocks.29.self_attn.qkv_proj.bias", "blocks.30.self_attn.qkv_proj.weight", "blocks.30.self_attn.qkv_proj.bias", "blocks.31.self_attn.qkv_proj.weight", "blocks.31.self_attn.qkv_proj.bias", "blocks.32.self_attn.qkv_proj.weight", "blocks.32.self_attn.qkv_proj.bias", "blocks.33.self_attn.qkv_proj.weight", "blocks.33.self_attn.qkv_proj.bias", "blocks.34.self_attn.qkv_proj.weight", "blocks.34.self_attn.qkv_proj.bias", "blocks.35.self_attn.qkv_proj.weight", "blocks.35.self_attn.qkv_proj.bias", "blocks.36.self_attn.qkv_proj.weight", "blocks.36.self_attn.qkv_proj.bias", "blocks.37.self_attn.qkv_proj.weight", "blocks.37.self_attn.qkv_proj.bias", "blocks.38.self_attn.qkv_proj.weight", "blocks.38.self_attn.qkv_proj.bias", "blocks.39.self_attn.qkv_proj.weight", "blocks.39.self_attn.qkv_proj.bias", "blocks.40.self_attn.qkv_proj.weight", "blocks.40.self_attn.qkv_proj.bias", "blocks.41.self_attn.qkv_proj.weight", "blocks.41.self_attn.qkv_proj.bias", "blocks.42.self_attn.qkv_proj.weight", "blocks.42.self_attn.qkv_proj.bias", "blocks.43.self_attn.qkv_proj.weight", "blocks.43.self_attn.qkv_proj.bias", "blocks.44.self_attn.qkv_proj.weight", "blocks.44.self_attn.qkv_proj.bias", "blocks.45.self_attn.qkv_proj.weight", "blocks.45.self_attn.qkv_proj.bias", "blocks.46.self_attn.qkv_proj.weight", "blocks.46.self_attn.qkv_proj.bias", "blocks.47.self_attn.qkv_proj.weight", "blocks.47.self_attn.qkv_proj.bias".
size mismatch for embed.word_embeddings.weight: copying a param with shape torch.Size([12568, 7168]) from checkpoint, the shape in current model is torch.Size([50272, 7168]).
size mismatch for blocks.0.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.0.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.0.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.0.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.1.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.1.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.1.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.1.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.2.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.2.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.2.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.2.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.3.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.3.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.3.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.3.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.4.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.4.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.4.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.4.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.5.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.5.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.5.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.5.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.6.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.6.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.6.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.6.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.7.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.7.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.7.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.7.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.8.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.8.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.8.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.8.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.9.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.9.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.9.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.9.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.10.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.10.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.10.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.10.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.11.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.11.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.11.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.11.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.12.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.12.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.12.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.12.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.13.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.13.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.13.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.13.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.14.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.14.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.14.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.14.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.15.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.15.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.15.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.15.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.16.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.16.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.16.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.16.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.17.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.17.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.17.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.17.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.18.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.18.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.18.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.18.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.19.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.19.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.19.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.19.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.20.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.20.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.20.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.20.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.21.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.21.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.21.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.21.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.22.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.22.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.22.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.22.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.23.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.23.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.23.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.23.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.24.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.24.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.24.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.24.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.25.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.25.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.25.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.25.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.26.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.26.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.26.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.26.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.27.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.27.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.27.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.27.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.28.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.28.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.28.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.28.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.29.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.29.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.29.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.29.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.30.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.30.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.30.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.30.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.31.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.31.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.31.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.31.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.32.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.32.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.32.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.32.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.33.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.33.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.33.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.33.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.34.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.34.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.34.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.34.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.35.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.35.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.35.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.35.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.36.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.36.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.36.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.36.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.37.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.37.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.37.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.37.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.38.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.38.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.38.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.38.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.39.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.39.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.39.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.39.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.40.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.40.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.40.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.40.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.41.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.41.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.41.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.41.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.42.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.42.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.42.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.42.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.43.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.43.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.43.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.43.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.44.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.44.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.44.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.44.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.45.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.45.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.45.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.45.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.46.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.46.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.46.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.46.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for blocks.47.attn.dense.weight: copying a param with shape torch.Size([7168, 896]) from checkpoint, the shape in current model is torch.Size([7168, 3584]).
size mismatch for blocks.47.mlp.dense_1.weight: copying a param with shape torch.Size([3584, 7168]) from checkpoint, the shape in current model is torch.Size([14336, 7168]).
size mismatch for blocks.47.mlp.dense_1.bias: copying a param with shape torch.Size([3584]) from checkpoint, the shape in current model is torch.Size([14336]).
size mismatch for blocks.47.mlp.dense_2.weight: copying a param with shape torch.Size([7168, 3584]) from checkpoint, the shape in current model is torch.Size([7168, 14336]).
size mismatch for head.dense.weight: copying a param with shape torch.Size([12568, 3584]) from checkpoint, the shape in current model is torch.Size([50272, 3584]).
Load file time: 13.155 s
load 4 files using 1 procs
Load file time: 13.153 s
Just realized that it does not support the opt ckpt files provided by meta. If anyone facing the same issue, use the ckpt files here. At least, it resolves the tensor size mismatch for opt_6.7B model.
Hi, I have some difficulties loading the pre-trained model weights for OPT_125M provided by Meta. Here are the error messages: Process SpawnProcess-1:
Seems that load_checkpoint() and the data in checkpoint.pt have different naming conventions. Is this caused by a version issue? I am using energonai==0.0.2.
Thanks for your help in advance.