jzhang38 / TinyLlama

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
Apache License 2.0
7.7k stars 453 forks source link

Encountered an issue while loading the model using transformers #179

Open Yukang-Lin opened 5 months ago

Yukang-Lin commented 5 months ago

I try to load the model with transformers, small_model = AutoModelForCausalLM.from_pretrained(approx_model_name, torch_dtype=torch.float16, device_map="auto", trust_remote_code=True)

but error occurs, OSError: Unable to load weights from pytorch checkpoint file for '/mnt/data3/lyk/models/tinyllama-1.1b/pytorch_model.bin' at '/mnt/data3/lyk/models/tinyllama-1.1b/pytorch_model.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.

and when I set from_tf=True, another error occurs AttributeError: module transformers has no attribute TFLlamaForCausalLM

My package is torch2.1.0, transformers 4.39.3

RmZeta2718 commented 4 months ago

I encountered the same issue. It seems to be a bug in scripts/convert_lit_checkpoint.py. The model cannot be loaded due to UnicodeDecodeError (transformers 4.40.1).

Traceback (most recent call last):
  File "/home/user/.conda/envs/py39pt23/lib/python3.9/site-packages/transformers/modeling_utils.py", line 542, in load_state_dict
    if f.read(7) == "version":
  File "/home/user/.conda/envs/py39pt23/lib/python3.9/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb4 in position 64: invalid start byte

It works in transformers 4.35.0, so I load the model in this version and save it again using standard API, then the model can be loaded from future transformers versions. Note that I have safetensors installed, so the local model is saved as model.safetensors

# transformers 4.35.0
model = AutoModelForCausalLM.from_pretrained(model_path)
model.save_pretrained("local/path")
# transformers 4.40.1
model = AutoModelForCausalLM.from_pretrained("local/path")  # ok
# model = AutoModelForCausalLM.from_pretrained(model_path)  # UnicodeDecodeError, OSError