oobabooga / text-generation-webui

A Gradio web UI for Large Language Models.
GNU Affero General Public License v3.0
40k stars 5.25k forks source link

PHI-3 128K GGUF - Model Fails to Load #5930

Open dmsweetser opened 5 months ago

dmsweetser commented 5 months ago

Describe the bug

I'm attempting to load https://huggingface.co/QuantFactory/Phi-3-mini-128k-instruct-GGUF with default options and I receive this error:

Traceback (most recent call last):

File "C:\Files\text-generation-web-ui\modules\ui_model_menu.py", line 247, in load_model_wrapper

shared.model, shared.tokenizer = load_model(selected_model, loader)

                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "C:\Files\text-generation-web-ui\modules\models.py", line 94, in load_model

output = load_func_maploader

     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "C:\Files\text-generation-web-ui\modules\models.py", line 271, in llamacpp_loader

model, tokenizer = LlamaCppModel.from_pretrained(model_file)

               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "C:\Files\text-generation-web-ui\modules\llamacpp_model.py", line 77, in from_pretrained

result.model = Llama(**params)

           ^^^^^^^^^^^^^^^

File "C:\Files\text-generation-web-ui\installer_files\env\Lib\site-packages\llama_cpp\llama.py", line 323, in init

self._model = _LlamaModel(

          ^^^^^^^^^^^^

File "C:\Files\text-generation-web-ui\installer_files\env\Lib\site-packages\llama_cpp_internals.py", line 55, in init

raise ValueError(f"Failed to load model from file: {path_model}") ValueError: Failed to load model from file: models\Phi-3-mini-128k-instruct.Q4_K_M.gguf

Is there an existing issue for this?

Reproduction

Download and load the model:

https://huggingface.co/QuantFactory/Phi-3-mini-128k-instruct-GGUF/resolve/main/Phi-3-mini-128k-instruct.Q4_K_M.gguf?download=true

Screenshot

No response

Logs

Traceback (most recent call last):

File "C:\Files\text-generation-web-ui\modules\ui_model_menu.py", line 247, in load_model_wrapper

shared.model, shared.tokenizer = load_model(selected_model, loader)

                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Files\text-generation-web-ui\modules\models.py", line 94, in load_model

output = load_func_map[loader](model_name)

         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Files\text-generation-web-ui\modules\models.py", line 271, in llamacpp_loader

model, tokenizer = LlamaCppModel.from_pretrained(model_file)

                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Files\text-generation-web-ui\modules\llamacpp_model.py", line 77, in from_pretrained

result.model = Llama(**params)

               ^^^^^^^^^^^^^^^
File "C:\Files\text-generation-web-ui\installer_files\env\Lib\site-packages\llama_cpp\llama.py", line 323, in init

self._model = _LlamaModel(

              ^^^^^^^^^^^^
File "C:\Files\text-generation-web-ui\installer_files\env\Lib\site-packages\llama_cpp_internals.py", line 55, in init

raise ValueError(f"Failed to load model from file: {path_model}")
ValueError: Failed to load model from file: models\Phi-3-mini-128k-instruct.Q4_K_M.gguf

System Info

Windows 10
NVIDIA GTX 1660 Super
LVPS24 commented 5 months ago

Same error with https://huggingface.co/pjh64/Phi-3-mini-128K-Instruct.gguf

Malrama commented 5 months ago

That's because the .gguf files do seem to be corrupted or bad. The gguf files from here work flawless https://huggingface.co/PrunaAI/Phi-3-mini-128k-instruct-GGUF-Imatrix-smashed

dmsweetser commented 5 months ago

It looks like they are still working on the 128k variant in llama.cpp:

https://github.com/ggerganov/llama.cpp/issues/6849

dmsweetser commented 5 months ago

One other note - I am seeing that I can load https://huggingface.co/PrunaAI/Phi-3-mini-128k-instruct-GGUF-Imatrix-smashed, but there are irregularities when I use a larger prompt. I suspect this is related to the llama.cpp remaining work.