jzhang38 / TinyLlama

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
Apache License 2.0
7.31k stars 426 forks source link

All 'rotary_emb.inv_freq' are not able to load with Automodel or Llamamodel. #127

Closed becxer closed 3 months ago

becxer commented 6 months ago

Hi, I'm trying to use the checkpoint with frozen. However, it is not able to load rotary embedding parameters with following code,

model = AutoModelForCausalLM.from_pretrained("TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T")

and it keeps showing the warning message.

Some weights of LlamaForCausalLM were not initialized from the model checkpoint at TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T and are newly initialized: 
['model.layers.1.self_attn.rotary_emb.inv_freq', 'model.layers.7.self_attn.rotary_emb.inv_freq', 
'model.layers.2.self_attn.rotary_emb.inv_freq', 'model.layers.8.self_attn.rotary_emb.inv_freq',
 'model.layers.9.self_attn.rotary_emb.inv_freq', 'model.layers.16.self_attn.rotary_emb.inv_freq',
 'model.layers.10.self_attn.rotary_emb.inv_freq', 'model.layers.6.self_attn.rotary_emb.inv_freq', 'model.layers.5.self_attn.rotary_emb.inv_freq', 'model.layers.15.self_attn.rotary_emb.inv_freq',
'model.layers.11.self_attn.rotary_emb.inv_freq', 'model.layers.18.self_attn.rotary_emb.inv_freq', 
'model.layers.12.self_attn.rotary_emb.inv_freq', 'model.layers.13.self_attn.rotary_emb.inv_freq', 'model.layers.14.self_attn.rotary_emb.inv_freq', 'model.layers.17.self_attn.rotary_emb.inv_freq', 
'model.layers.21.self_attn.rotary_emb.inv_freq', 'model.layers.0.self_attn.rotary_emb.inv_freq', 'model.layers.3.self_attn.rotary_emb.inv_freq', 'model.layers.20.self_attn.rotary_emb.inv_freq', 
'model.layers.4.self_attn.rotary_emb.inv_freq', 'model.layers.19.self_attn.rotary_emb.inv_freq']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
calvintwr commented 4 months ago

I managed to get past this by converting the .pth files to safetensors using code from https://gist.githubusercontent.com/epicfilemcnulty/1f55fd96b08f8d4d6693293e37b4c55e/raw/3e099f23cdda9c38104de83d2108fe891f16d8ca/2safetensors.py

becxer commented 3 months ago

This was resolved by updating transformer version to 3.50 (which is mentioned in config.json)