oobabooga / text-generation-webui

A Gradio web UI for Large Language Models.
GNU Affero General Public License v3.0
39.41k stars 5.18k forks source link

Error loading model #3251

Closed CriIzel closed 1 year ago

CriIzel commented 1 year ago

Describe the bug

Whenever I try to load the model, error shows

Is there an existing issue for this?

Reproduction

amd is not supported, so I am using cpu mode

CMD_Flags= --chat --cpu

model: https://huggingface.co/mayaeary/pygmalion-6b_dev-4bit-128g

Screenshot

No response

Logs

(most recent call last):
  File "E:\AI_Programs\oobabooga\text-generation-webui\server.py", line 68, in load_model_wrapper
    shared.model, shared.tokenizer = load_model(shared.model_name, loader)
  File "E:\AI_Programs\oobabooga\text-generation-webui\modules\models.py", line 79, in load_model
    output = load_func_map[loader](model_name)
  File "E:\AI_Programs\oobabooga\text-generation-webui\modules\models.py", line 314, in AutoGPTQ_loader
    return modules.AutoGPTQ_loader.load_quantized(model_name)
  File "E:\AI_Programs\oobabooga\text-generation-webui\modules\AutoGPTQ_loader.py", line 56, in load_quantized
    model = AutoGPTQForCausalLM.from_quantized(path_to_model, **params)
  File "E:\AI_Programs\oobabooga\installer_files\env\lib\site-packages\auto_gptq\modeling\auto.py", line 94, in from_quantized
    return quant_func(
  File "E:\AI_Programs\oobabooga\installer_files\env\lib\site-packages\auto_gptq\modeling\_base.py", line 749, in from_quantized
    make_quant(
  File "E:\AI_Programs\oobabooga\installer_files\env\lib\site-packages\auto_gptq\modeling\_utils.py", line 92, in make_quant
    make_quant(
  File "E:\AI_Programs\oobabooga\installer_files\env\lib\site-packages\auto_gptq\modeling\_utils.py", line 92, in make_quant
    make_quant(
  File "E:\AI_Programs\oobabooga\installer_files\env\lib\site-packages\auto_gptq\modeling\_utils.py", line 92, in make_quant
    make_quant(
  [Previous line repeated 1 more time]
  File "E:\AI_Programs\oobabooga\installer_files\env\lib\site-packages\auto_gptq\modeling\_utils.py", line 84, in make_quant
    new_layer = QuantLinear(
  File "E:\AI_Programs\oobabooga\installer_files\env\lib\site-packages\auto_gptq\nn_modules\qlinear\qlinear_cuda_old.py", line 83, in __init__
    self.autogptq_cuda = autogptq_cuda_256
NameError: name 'autogptq_cuda_256' is not defined

System Info

3.40 GHz Intel I7-4770 CPU
8192MB RAM
RX 570 GPU (4075 VRAM)
1TB HDD (OogaBooga installed)
120GB SSD (System)
aurumh commented 1 year ago

same issue to be more precise, i get "Traceback (most recent call last): File “C:\Users\User\Desktop\text-generation-webui[server.py](http://server.py/)”, line 142, in download_model_wrapper model, branch = downloader.sanitize_model_and_branch_names(model, branch) File “C:\Users\User\Desktop\text-generation-webui[download-model.py](http://download-model.py/)”, line 37, in sanitize_model_and_branch_names if model[-1] == ‘/’: IndexError: string index out of range"

emoshunaldamage commented 1 year ago

Same problem, cannot load any models

Identical error messages from the Pygmallion model load.

Some different error messages from attempting Llama load, but also contains the first 2 errors (server.py-line68, Models.py-line79), followed by:

File “C:\GPT\textgen\modules\models.py”, line 149, in huggingface_loader model = LoaderClass.from_pretrained(Path(f"{shared.args.model_dir}/{model_name}"), low_cpu_mem_usage=True, torch_dtype=torch.bfloat16 if shared.args.bf16 else torch.float16, trust_remote_code=shared.args.trust_remote_code)

File “C:\ProgramData\anaconda3\Lib\site-packages\transformers\models\auto\auto_factory.py”, line 493, in from_pretrained return model_class.from_pretrained( File “C:\ProgramData\anaconda3\Lib\site-packages\transformers\modeling_utils.py”, line 2903, in from_pretrained ) = cls._load_pretrained_model( File “C:\ProgramData\anaconda3\Lib\site-packages\transformers\modeling_utils.py”, line 3260, in _load_pretrained_model new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model( File “C:\ProgramData\anaconda3\Lib\site-packages\transformers\modeling_utils.py”, line 682, in _load_state_dict_into_meta_model param = param.to(dtype)

Flanua commented 1 year ago

Describe the bug

Whenever I try to load the model, error shows

Is there an existing issue for this?

  • [x] I have searched the existing issues

Reproduction

amd is not supported, so I am using cpu mode

CMD_Flags= --chat --cpu

model: https://huggingface.co/mayaeary/pygmalion-6b_dev-4bit-128g

Screenshot

No response

Logs

(most recent call last):
  File "E:\AI_Programs\oobabooga\text-generation-webui\server.py", line 68, in load_model_wrapper
    shared.model, shared.tokenizer = load_model(shared.model_name, loader)
  File "E:\AI_Programs\oobabooga\text-generation-webui\modules\models.py", line 79, in load_model
    output = load_func_map[loader](model_name)
  File "E:\AI_Programs\oobabooga\text-generation-webui\modules\models.py", line 314, in AutoGPTQ_loader
    return modules.AutoGPTQ_loader.load_quantized(model_name)
  File "E:\AI_Programs\oobabooga\text-generation-webui\modules\AutoGPTQ_loader.py", line 56, in load_quantized
    model = AutoGPTQForCausalLM.from_quantized(path_to_model, **params)
  File "E:\AI_Programs\oobabooga\installer_files\env\lib\site-packages\auto_gptq\modeling\auto.py", line 94, in from_quantized
    return quant_func(
  File "E:\AI_Programs\oobabooga\installer_files\env\lib\site-packages\auto_gptq\modeling\_base.py", line 749, in from_quantized
    make_quant(
  File "E:\AI_Programs\oobabooga\installer_files\env\lib\site-packages\auto_gptq\modeling\_utils.py", line 92, in make_quant
    make_quant(
  File "E:\AI_Programs\oobabooga\installer_files\env\lib\site-packages\auto_gptq\modeling\_utils.py", line 92, in make_quant
    make_quant(
  File "E:\AI_Programs\oobabooga\installer_files\env\lib\site-packages\auto_gptq\modeling\_utils.py", line 92, in make_quant
    make_quant(
  [Previous line repeated 1 more time]
  File "E:\AI_Programs\oobabooga\installer_files\env\lib\site-packages\auto_gptq\modeling\_utils.py", line 84, in make_quant
    new_layer = QuantLinear(
  File "E:\AI_Programs\oobabooga\installer_files\env\lib\site-packages\auto_gptq\nn_modules\qlinear\qlinear_cuda_old.py", line 83, in __init__
    self.autogptq_cuda = autogptq_cuda_256
NameError: name 'autogptq_cuda_256' is not defined

System Info

3.40 GHz Intel I7-4770 CPU
8192MB RAM
RX 570 GPU (4075 VRAM)
1TB HDD (OogaBooga installed)
120GB SSD (System)

If you are using it only in CPU mode then reinstall WEB UI with CPU mode only. I was able to fix alot of errors that way. P.S: Also it looks like you are using GPTQ model with safetensors and you need to use GGML for CPU mode.

FranticFoxGarage commented 1 year ago

Getting the same deal but i'm trying to use CUDA with an NVidia card Doing the same on both CL Install and using the OneClick installer. Started completely from scratch multiple times, just short of re setting up WSL

(textgen) anon@OfficePC:~/text-generation-webui$ python server.py --listen --listen-port 8889 --api --chat
Starting streaming server at ws://0.0.0.0:5005/api/v1/stream
2023-07-27 17:08:41 INFO:Loading the extension "gallery"...
Starting API at http://0.0.0.0:5000/api
Running on local URL:  http://0.0.0.0:8889

To create a public link, set `share=True` in `launch()`.
2023-07-27 17:09:09 INFO:Loading TheBloke_MythoBoros-13B-GPTQ...
2023-07-27 17:09:09 INFO:The AutoGPTQ params are: {'model_basename': 'gptq_model-4bit-128g', 'device': 'cuda:0', 'use_triton': False, 'inject_fused_attention': True, 'inject_fused_mlp': True, 'use_safetensors': True, 'trust_remote_code': False, 'max_memory': None, 'quantize_config': None, 'use_cuda_fp16': True}
2023-07-27 17:09:09 WARNING:CUDA extension not installed.
2023-07-27 17:09:09 ERROR:Failed to load the model.
Traceback (most recent call last):
  File "/home/anon/text-generation-webui/server.py", line 68, in load_model_wrapper
    shared.model, shared.tokenizer = load_model(shared.model_name, loader)
  File "/home/anon/text-generation-webui/modules/models.py", line 78, in load_model
    output = load_func_map[loader](model_name)
  File "/home/anon/text-generation-webui/modules/models.py", line 287, in AutoGPTQ_loader
    return modules.AutoGPTQ_loader.load_quantized(model_name)
  File "/home/anon/text-generation-webui/modules/AutoGPTQ_loader.py", line 56, in load_quantized
    model = AutoGPTQForCausalLM.from_quantized(path_to_model, **params)
  File "/home/anon/miniconda3/envs/textgen/lib/python3.10/site-packages/auto_gptq/modeling/auto.py", line 94, in from_quantized
    return quant_func(
  File "/home/anon/miniconda3/envs/textgen/lib/python3.10/site-packages/auto_gptq/modeling/_base.py", line 749, in from_quantized
    make_quant(
  File "/home/anon/miniconda3/envs/textgen/lib/python3.10/site-packages/auto_gptq/modeling/_utils.py", line 92, in make_quant
    make_quant(
  File "/home/anon/miniconda3/envs/textgen/lib/python3.10/site-packages/auto_gptq/modeling/_utils.py", line 92, in make_quant
    make_quant(
  File "/home/anon/miniconda3/envs/textgen/lib/python3.10/site-packages/auto_gptq/modeling/_utils.py", line 92, in make_quant
    make_quant(
  [Previous line repeated 1 more time]
  File "/home/anon/miniconda3/envs/textgen/lib/python3.10/site-packages/auto_gptq/modeling/_utils.py", line 84, in make_quant
    new_layer = QuantLinear(
  File "/home/anon/miniconda3/envs/textgen/lib/python3.10/site-packages/auto_gptq/nn_modules/qlinear/qlinear_cuda_old.py", line 83, in __init__
    self.autogptq_cuda = autogptq_cuda_256
NameError: name 'autogptq_cuda_256' is not defined 

System Info

Intel I7-13700K CPU
64GB RAM
RTX 3090 Ti GPU (24 Gb VRAM)
G: 1TB SSD (Running WSL2 Ubuntu 20.4)
C: 500 GB SSD (System)

Running "nvcc --version":

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176

Running "nvidia-smi"

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.41.03              Driver Version: 531.41       CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3090 Ti      On | 00000000:01:00.0  On |                  Off |
|  0%   38C    P8               14W / 450W|   1100MiB / 24564MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A        66      G   /Xwayland                                 N/A      |
+---------------------------------------------------------------------------------------+
Flanua commented 1 year ago

Getting the same deal but i'm trying to use CUDA with an NVidia card Doing the same on both CL Install and using the OneClick installer. Started completely from scratch multiple times, just short of re setting up WSL

(textgen) anon@OfficePC:~/text-generation-webui$ python server.py --listen --listen-port 8889 --api --chat
Starting streaming server at ws://0.0.0.0:5005/api/v1/stream
2023-07-27 17:08:41 INFO:Loading the extension "gallery"...
Starting API at http://0.0.0.0:5000/api
Running on local URL:  http://0.0.0.0:8889

To create a public link, set `share=True` in `launch()`.
2023-07-27 17:09:09 INFO:Loading TheBloke_MythoBoros-13B-GPTQ...
2023-07-27 17:09:09 INFO:The AutoGPTQ params are: {'model_basename': 'gptq_model-4bit-128g', 'device': 'cuda:0', 'use_triton': False, 'inject_fused_attention': True, 'inject_fused_mlp': True, 'use_safetensors': True, 'trust_remote_code': False, 'max_memory': None, 'quantize_config': None, 'use_cuda_fp16': True}
2023-07-27 17:09:09 WARNING:CUDA extension not installed.
2023-07-27 17:09:09 ERROR:Failed to load the model.
Traceback (most recent call last):
  File "/home/anon/text-generation-webui/server.py", line 68, in load_model_wrapper
    shared.model, shared.tokenizer = load_model(shared.model_name, loader)
  File "/home/anon/text-generation-webui/modules/models.py", line 78, in load_model
    output = load_func_map[loader](model_name)
  File "/home/anon/text-generation-webui/modules/models.py", line 287, in AutoGPTQ_loader
    return modules.AutoGPTQ_loader.load_quantized(model_name)
  File "/home/anon/text-generation-webui/modules/AutoGPTQ_loader.py", line 56, in load_quantized
    model = AutoGPTQForCausalLM.from_quantized(path_to_model, **params)
  File "/home/anon/miniconda3/envs/textgen/lib/python3.10/site-packages/auto_gptq/modeling/auto.py", line 94, in from_quantized
    return quant_func(
  File "/home/anon/miniconda3/envs/textgen/lib/python3.10/site-packages/auto_gptq/modeling/_base.py", line 749, in from_quantized
    make_quant(
  File "/home/anon/miniconda3/envs/textgen/lib/python3.10/site-packages/auto_gptq/modeling/_utils.py", line 92, in make_quant
    make_quant(
  File "/home/anon/miniconda3/envs/textgen/lib/python3.10/site-packages/auto_gptq/modeling/_utils.py", line 92, in make_quant
    make_quant(
  File "/home/anon/miniconda3/envs/textgen/lib/python3.10/site-packages/auto_gptq/modeling/_utils.py", line 92, in make_quant
    make_quant(
  [Previous line repeated 1 more time]
  File "/home/anon/miniconda3/envs/textgen/lib/python3.10/site-packages/auto_gptq/modeling/_utils.py", line 84, in make_quant
    new_layer = QuantLinear(
  File "/home/anon/miniconda3/envs/textgen/lib/python3.10/site-packages/auto_gptq/nn_modules/qlinear/qlinear_cuda_old.py", line 83, in __init__
    self.autogptq_cuda = autogptq_cuda_256
NameError: name 'autogptq_cuda_256' is not defined 

System Info

Intel I7-13700K CPU
64GB RAM
RTX 3090 Ti GPU (24 Gb VRAM)
G: 1TB SSD (Running WSL2 Ubuntu 20.4)
C: 500 GB SSD (System)

Running "nvcc --version":

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176

Running "nvidia-smi"

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.41.03              Driver Version: 531.41       CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3090 Ti      On | 00000000:01:00.0  On |                  Off |
|  0%   38C    P8               14W / 450W|   1100MiB / 24564MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A        66      G   /Xwayland                                 N/A      |
+---------------------------------------------------------------------------------------+

I'm pretty sure it's the problem with CUDA or quant cuda compilation libraries or something like that and It's possible to make it work after investing a ton of time and installing different libraries but I just gave up on that and built my WEB UI with CPU mode only. Before I gave up I was able to set WEB UI to run without CUDA don't remember exactly how but it wasn't too hard but the speed was probably a bit slower without CUDA acceleration but my GPU was able to work just fine even with some good amount of layers being offloaded on it.

mongolu commented 1 year ago

nvcc V9.0.176 < 11.7 I also had this problem myself. Win11+WSL2+Ubuntu20 04. It's about (I think) the version which Ubuntu 20.04 knows about. Even if I installed manually theb11.7, he still saw 9. Found somewhere how to sym link the new one. But I struggled a lot around this and decided to use docker starting from a plain simple Ubuntu 22.04 and putting all that it's needed in it. I've build my Dockerfile with cu da toolkit install in it and using one-click installer for Ooba. Now I don't have problems anymore 🙂. Back to you, take your time to read this (even it's old), it's a good starting point. https://askubuntu.com/questions/530043/removing-nvidia-cuda-toolkit-and-installing-new-one

github-actions[bot] commented 1 year ago

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.