MobileLLM safetensors seem to be missing model.embed_tokens.weight

avishaiElmakies commented 1 day ago

System Info

transformers version: 4.46.2
Platform: Linux-6.6.20-aufs-1-x86_64-with-glibc2.36
Python version: 3.11.2
Huggingface_hub version: 0.26.2
Safetensors version: 0.4.5
Accelerate version: 1.1.0
Accelerate config: not found
PyTorch version (GPU?): 2.5.1+cu124 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using distributed or parallel set-up in script?: no
Using GPU in script?: no
GPU type: NVIDIA RTX A5000

Who can help?

@ArthurZucker

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

mobilellm = AutoModelForCausalLM.from_pretrained("facebook/MobileLLM-125M",trust_remote_code=True)

will output Some weights of MobileLLMForCausalLM were not initialized from the model checkpoint at facebook/MobileLLM-125M and are newly initialized: ['model.embed_tokens.weight']

and the weights will be random. when using use_safetensor=False. everything seems to work as expected.

Expected behavior

using safetensors should work the same as when not using them.

mayankagarwals commented 18 hours ago

Hi 👋 Am able to reproduce this, checking this!

mayankagarwals commented 12 hours ago

can you please provide the code snippet where you are not seeing any error (without using safe tensors) @avishaiElmakies

avishaiElmakies commented 12 hours ago

There should be a single "error" about lm_head.weight, since the model uses weight tieing for the embeeding and output layer. Both safetensors and normal loading does this.

the problem is that when using safetensors the embedding layer seems to be missing which causes problems with both the embedding layer and the output layer.

maybe I should have been more clear about that in the bug report (sorry about that).

huggingface / transformers