huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
135.09k stars 27.03k forks source link

MobileLLM safetensors seem to be missing model.embed_tokens.weight #34759

Open avishaiElmakies opened 1 day ago

avishaiElmakies commented 1 day ago

System Info

Who can help?

@ArthurZucker

Information

Tasks

Reproduction

mobilellm = AutoModelForCausalLM.from_pretrained("facebook/MobileLLM-125M",trust_remote_code=True)

will output Some weights of MobileLLMForCausalLM were not initialized from the model checkpoint at facebook/MobileLLM-125M and are newly initialized: ['model.embed_tokens.weight']

and the weights will be random. when using use_safetensor=False. everything seems to work as expected.

Expected behavior

using safetensors should work the same as when not using them.

mayankagarwals commented 18 hours ago

Hi 👋 Am able to reproduce this, checking this!

mayankagarwals commented 12 hours ago

can you please provide the code snippet where you are not seeing any error (without using safe tensors) @avishaiElmakies

avishaiElmakies commented 12 hours ago

There should be a single "error" about lm_head.weight, since the model uses weight tieing for the embeeding and output layer. Both safetensors and normal loading does this.

the problem is that when using safetensors the embedding layer seems to be missing which causes problems with both the embedding layer and the output layer.

maybe I should have been more clear about that in the bug report (sorry about that).