turboderp / exllama

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
MIT License
2.67k stars 214 forks source link

Header too large error when running benchmark #242

Closed DKormann closed 11 months ago

DKormann commented 11 months ago

I get the error safetensors_rust.SafetensorError: Error while deserializing header: HeaderTooLarge

when running python test_benchmark_inference.py -d ../models/Llama-2-13B-chat-GPTQ o my ubuntu machine.

the solution from #176 installing transformers from HF did'nt help.

turboderp commented 11 months ago

The solution to #176 wasn't installing Transformers, it was downloading the model again. It seems to happen quite a bit that people end up with corrupted tokenizer.model files. Is everyone using some tool to download models? It seems like an oddly specific error to come up so many times otherwise.

Anyway, try downloading the tokenizer.model file again.

DKormann commented 11 months ago

thank you! I fixed it by downloading the model with git lfs enabled. Thought it was ok to ignore the git lfs advice in HF since it worked on smaller models.