jquesnelle / yarn

YaRN: Efficient Context Window Extension of Large Language Models
MIT License
1.25k stars 110 forks source link

cannot load safetensor: Trying to set a tensor of shape torch.Size([0]) in "weight" (which has shape torch.Size([32000, 4096])) #47

Closed tpoisonooo closed 7 months ago

tpoisonooo commented 7 months ago

After check #45 , #40 and some hard-code modification, these command passed

# trainning
accelerate launch finetune.py   --output-dir output/yarn-7b-8k   
--model NousResearch/Llama-2-7b-hf  --scaling-factor 2  
--wandb yarn  --dataset
 emozilla/yarn-train-tokenized-8k-llama    --deepspeed

# save
accelerate launch finetune.py   --output-dir output/yarn-7b-8k   
--model NousResearch/Llama-2-7b-hf --save-only  --scaling-factor 2  
--wandb yarn --output-dir output-8k-save  --dataset
 emozilla/yarn-train-tokenized-8k-llama    --deepspeed

And I got these files:

(torch2) root@9b2ed2383075:/workspace/yarn/output/yarn-7b-8k# tree
.
|-- config.json
|-- model-00001-of-00003.safetensors
|-- model-00002-of-00003.safetensors
|-- model-00003-of-00003.safetensors
|-- model.safetensors
`-- model.safetensors.index.json

For load it with passkey.py, I merge these safetensors into original NousResearch/Llama-2-7b-hf and got this error

torch2) root@9b2ed2383075:/workspace/yarn# python3 eval/passkey.py -m /workspace/models/Llama-2-7b-hf/
Determining sequence lengths: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:04<00:00,  1.48it/s]
Model:   0%|                                                                                                                                                                                                                                            | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):                                                                                                                                                                                                                                            
  File "/workspace/yarn/eval/passkey.py", line 127, in <module>
    main(add_args(parser).parse_args())
  File "/workspace/yarn/eval/passkey.py", line 90, in main
    loaded = load_model_and_apply_patches(model, args)
  File "/workspace/yarn/eval/model_loader.py", line 215, in load_model_and_apply_patches
    return apply_patches(load_model(model, args), args)
  File "/workspace/yarn/eval/model_loader.py", line 90, in load_model
    loaded = model_cls.from_pretrained(
  File "/root/miniconda3/envs/torch2/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 566, in from_pretrained
    return model_class.from_pretrained(
  File "/root/miniconda3/envs/torch2/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3480, in from_pretrained
    ) = cls._load_pretrained_model(
  File "/root/miniconda3/envs/torch2/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3870, in _load_pretrained_model
    new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
  File "/root/miniconda3/envs/torch2/lib/python3.10/site-packages/transformers/modeling_utils.py", line 743, in _load_state_dict_into_meta_model
    set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
  File "/root/miniconda3/envs/torch2/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 285, in set_module_tensor_to_device
    raise ValueError(
ValueError: Trying to set a tensor of shape torch.Size([0]) in "weight" (which has shape torch.Size([32000, 4096])), this look incorrect.

I noticed that your official https://huggingface.co/NousResearch/Yarn-Llama-2-7b-64k does not need any safetensor and can be test succesfully,

Did I missed any model conversion script ?

tpoisonooo commented 7 months ago

Fixed after debugging transformers source code whole day.

praveenkanithi commented 6 months ago

Hi @tpoisonooo, I'm facing the same issue, how did you solve is? Anything specific to transformers version or do I need to convert safetensors? Thanks.

praveenkanithi commented 6 months ago

It looks like the model saving code using accelerator is saving model.safetensors file along with the true safetensors files, which is causing the issue. Removing that file seems to have solved the issue. But, I'm not sure why huggingface is loading that file despite specifying a weight map in model.safetensors.index.json.

waterluck commented 1 month ago

@praveenkanithi @tpoisonooo May I ask how you solved this problem? I only get safetensors model, but get the titled error when load the model.