turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.45k stars 257 forks source link

Windows 10, Oobabooga, inability to load some models #173

Closed homeworkace closed 9 months ago

homeworkace commented 9 months ago

As first discussed in a HF model page https://huggingface.co/LoneStriker/lzlv_70b_fp16_hf-2.4bpw-h6-exl2/discussions/1#655cce70b53face4cfe7aa93 . The model author suggested this is a Windows-specific issue. Cheers!

Traceback (most recent call last):

File "A:\LLaMa\text-generation-webui\modules\ui_model_menu.py", line 210, in load_model_wrapper

shared.model, shared.tokenizer = load_model(shared.model_name, loader)

                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "A:\LLaMa\text-generation-webui\modules\models.py", line 85, in load_model

output = load_func_map[loader](https://huggingface.co/LoneStriker/lzlv_70b_fp16_hf-2.4bpw-h6-exl2/discussions/model_name)

     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "A:\LLaMa\text-generation-webui\modules\models.py", line 363, in ExLlamav2_HF_loader

return Exllamav2HF.from_pretrained(model_name)

   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "A:\LLaMa\text-generation-webui\modules\exllamav2_hf.py", line 162, in from_pretrained

config.prepare()

File "A:\LLaMa\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\config.py", line 111, in prepare

with safe_open(st_file, framework = "pt", device = "cpu") as f:

 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

safetensors_rust.SafetensorError: Error while deserializing header: MetadataIncompleteBuffer
turboderp commented 9 months ago

This is most likely because the model wasn't downloaded correctly. Check that the .safetensors files look sane, that you don't one very small output.safetensors stub along with the actual file called output.safetensors?download=True or something. I've seen that a few times, at least.

homeworkace commented 9 months ago

I couldn't find what you were referring to, but I think I've solved the problem.

The directory for LoneStriker/lzlv_70b_fp16_hf-2.4bpw-h6-exl2:

image

I had it downloaded twice. This earlier copy looks different, but they both don't work all the same:

image

Then I looked at a model with similar BPW that does work. I suspected that for the broken models one of the output files wasn't fully downloaded:

image

I used Oobabooga to download the models again but forgot to move the models to another file. It turned out that only the incomplete part of the output file is downloaded, and after that the model as a whole works again. It's weird how Ooba would consider the download as done the first time around when its clearly incomplete, but I think that issue is out of your reach. Thanks for your help.

turboderp commented 9 months ago

Yeah, HF download can fail quite randomly it seems. Not sure if there's any way to detect that in the TGW download function. I'll think about adding a more informative error message, since the MetadataIncompleteBuffer exception can pretty much only mean corrupted/incomplete safetensors files.