turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.19k stars 234 forks source link

Cannot load models saved with HF transformers due to shared tensors in safetensors #408

Closed AndrewRyanChama closed 2 weeks ago

AndrewRyanChama commented 2 months ago

I'm trying to open a checkpoint that was saved from huggingface but it fails

model = AutoModelForCausalLM.from_pretrained('Qwen/Qwen1.5-0.5B')
model.save_pretrained("testmodel")

Then when opening it with exllamav2 I get the error:

    raise ValueError(f" ## Could not find {prefix}.* in model")
ValueError:  ## Could not find lm_head.* in model

In the safetensors file there is no more lm_head attribute. I believe this is due to the torch shared tensors functionality: https://huggingface.co/docs/safetensors/torch_shared_tensors

The expected behavior is that exllamav2 should be able to load the checkpoints saved by huggingface transformers

turboderp commented 2 months ago

You can try converting the model with the -unshare flag using the util/convert_safetensors.py script. ExLlama does support tied embeddings but I didn't enable it for Qwen because all the official Qwen releases seem to not actually use any shared tensors. Even though this is set to True for the release model, the .safetensors file that ships with that model actually has separate embedding and head tensors.