turboderp / exllama

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
MIT License
2.74k stars 215 forks source link

HeaderTooLarge error #176

Closed abacaj closed 1 year ago

abacaj commented 1 year ago

Hi, was trying to run llama 2 70b, ran into this err:

  File "/home/anton/personal/transformer-experiments/exllama/model.py", line 697, in __init__
    with safe_open(self.config.model_path, framework = "pt", device = "cpu") as f:
safetensors_rust.SafetensorError: Error while deserializing header: HeaderTooLarge
python3 test_benchmark_inference.py -d ./Llama-2-70B-GPTQ

Model file: https://huggingface.co/TheBloke/Llama-2-70B-GPTQ

abacaj commented 1 year ago

Fixed with:

pip install git+https://github.com/huggingface/transformers

Now getting another err:

  File "/home/anton/personal/transformer-experiments/exllama/env/lib/python3.10/site-packages/sentencepiece/__init__.py", line 310, in LoadFromFile
    return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
RuntimeError: Internal: src/sentencepiece_processor.cc(1101) [model_proto->ParseFromArray(serialized.data(), serialized.size())] 
abacaj commented 1 year ago

Fixed, seems like for some reason the tokenizer.model wasn't downloaded properly when I pulled the 70b hf repo.