Closed CriIzel closed 1 year ago
same issue to be more precise, i get "Traceback (most recent call last): File “C:\Users\User\Desktop\text-generation-webui[server.py](http://server.py/)”, line 142, in download_model_wrapper model, branch = downloader.sanitize_model_and_branch_names(model, branch) File “C:\Users\User\Desktop\text-generation-webui[download-model.py](http://download-model.py/)”, line 37, in sanitize_model_and_branch_names if model[-1] == ‘/’: IndexError: string index out of range"
Same problem, cannot load any models
Identical error messages from the Pygmallion model load.
Some different error messages from attempting Llama load, but also contains the first 2 errors (server.py-line68, Models.py-line79), followed by:
File “C:\GPT\textgen\modules\models.py”, line 149, in huggingface_loader model = LoaderClass.from_pretrained(Path(f"{shared.args.model_dir}/{model_name}"), low_cpu_mem_usage=True, torch_dtype=torch.bfloat16 if shared.args.bf16 else torch.float16, trust_remote_code=shared.args.trust_remote_code)
File “C:\ProgramData\anaconda3\Lib\site-packages\transformers\models\auto\auto_factory.py”, line 493, in from_pretrained return model_class.from_pretrained( File “C:\ProgramData\anaconda3\Lib\site-packages\transformers\modeling_utils.py”, line 2903, in from_pretrained ) = cls._load_pretrained_model( File “C:\ProgramData\anaconda3\Lib\site-packages\transformers\modeling_utils.py”, line 3260, in _load_pretrained_model new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model( File “C:\ProgramData\anaconda3\Lib\site-packages\transformers\modeling_utils.py”, line 682, in _load_state_dict_into_meta_model param = param.to(dtype)
Describe the bug
Whenever I try to load the model, error shows
Is there an existing issue for this?
- [x] I have searched the existing issues
Reproduction
amd is not supported, so I am using cpu mode
CMD_Flags= --chat --cpu
model: https://huggingface.co/mayaeary/pygmalion-6b_dev-4bit-128g
Screenshot
No response
Logs
(most recent call last): File "E:\AI_Programs\oobabooga\text-generation-webui\server.py", line 68, in load_model_wrapper shared.model, shared.tokenizer = load_model(shared.model_name, loader) File "E:\AI_Programs\oobabooga\text-generation-webui\modules\models.py", line 79, in load_model output = load_func_map[loader](model_name) File "E:\AI_Programs\oobabooga\text-generation-webui\modules\models.py", line 314, in AutoGPTQ_loader return modules.AutoGPTQ_loader.load_quantized(model_name) File "E:\AI_Programs\oobabooga\text-generation-webui\modules\AutoGPTQ_loader.py", line 56, in load_quantized model = AutoGPTQForCausalLM.from_quantized(path_to_model, **params) File "E:\AI_Programs\oobabooga\installer_files\env\lib\site-packages\auto_gptq\modeling\auto.py", line 94, in from_quantized return quant_func( File "E:\AI_Programs\oobabooga\installer_files\env\lib\site-packages\auto_gptq\modeling\_base.py", line 749, in from_quantized make_quant( File "E:\AI_Programs\oobabooga\installer_files\env\lib\site-packages\auto_gptq\modeling\_utils.py", line 92, in make_quant make_quant( File "E:\AI_Programs\oobabooga\installer_files\env\lib\site-packages\auto_gptq\modeling\_utils.py", line 92, in make_quant make_quant( File "E:\AI_Programs\oobabooga\installer_files\env\lib\site-packages\auto_gptq\modeling\_utils.py", line 92, in make_quant make_quant( [Previous line repeated 1 more time] File "E:\AI_Programs\oobabooga\installer_files\env\lib\site-packages\auto_gptq\modeling\_utils.py", line 84, in make_quant new_layer = QuantLinear( File "E:\AI_Programs\oobabooga\installer_files\env\lib\site-packages\auto_gptq\nn_modules\qlinear\qlinear_cuda_old.py", line 83, in __init__ self.autogptq_cuda = autogptq_cuda_256 NameError: name 'autogptq_cuda_256' is not defined
System Info
3.40 GHz Intel I7-4770 CPU 8192MB RAM RX 570 GPU (4075 VRAM) 1TB HDD (OogaBooga installed) 120GB SSD (System)
If you are using it only in CPU mode then reinstall WEB UI with CPU mode only. I was able to fix alot of errors that way. P.S: Also it looks like you are using GPTQ model with safetensors and you need to use GGML for CPU mode.
Getting the same deal but i'm trying to use CUDA with an NVidia card Doing the same on both CL Install and using the OneClick installer. Started completely from scratch multiple times, just short of re setting up WSL
(textgen) anon@OfficePC:~/text-generation-webui$ python server.py --listen --listen-port 8889 --api --chat
Starting streaming server at ws://0.0.0.0:5005/api/v1/stream
2023-07-27 17:08:41 INFO:Loading the extension "gallery"...
Starting API at http://0.0.0.0:5000/api
Running on local URL: http://0.0.0.0:8889
To create a public link, set `share=True` in `launch()`.
2023-07-27 17:09:09 INFO:Loading TheBloke_MythoBoros-13B-GPTQ...
2023-07-27 17:09:09 INFO:The AutoGPTQ params are: {'model_basename': 'gptq_model-4bit-128g', 'device': 'cuda:0', 'use_triton': False, 'inject_fused_attention': True, 'inject_fused_mlp': True, 'use_safetensors': True, 'trust_remote_code': False, 'max_memory': None, 'quantize_config': None, 'use_cuda_fp16': True}
2023-07-27 17:09:09 WARNING:CUDA extension not installed.
2023-07-27 17:09:09 ERROR:Failed to load the model.
Traceback (most recent call last):
File "/home/anon/text-generation-webui/server.py", line 68, in load_model_wrapper
shared.model, shared.tokenizer = load_model(shared.model_name, loader)
File "/home/anon/text-generation-webui/modules/models.py", line 78, in load_model
output = load_func_map[loader](model_name)
File "/home/anon/text-generation-webui/modules/models.py", line 287, in AutoGPTQ_loader
return modules.AutoGPTQ_loader.load_quantized(model_name)
File "/home/anon/text-generation-webui/modules/AutoGPTQ_loader.py", line 56, in load_quantized
model = AutoGPTQForCausalLM.from_quantized(path_to_model, **params)
File "/home/anon/miniconda3/envs/textgen/lib/python3.10/site-packages/auto_gptq/modeling/auto.py", line 94, in from_quantized
return quant_func(
File "/home/anon/miniconda3/envs/textgen/lib/python3.10/site-packages/auto_gptq/modeling/_base.py", line 749, in from_quantized
make_quant(
File "/home/anon/miniconda3/envs/textgen/lib/python3.10/site-packages/auto_gptq/modeling/_utils.py", line 92, in make_quant
make_quant(
File "/home/anon/miniconda3/envs/textgen/lib/python3.10/site-packages/auto_gptq/modeling/_utils.py", line 92, in make_quant
make_quant(
File "/home/anon/miniconda3/envs/textgen/lib/python3.10/site-packages/auto_gptq/modeling/_utils.py", line 92, in make_quant
make_quant(
[Previous line repeated 1 more time]
File "/home/anon/miniconda3/envs/textgen/lib/python3.10/site-packages/auto_gptq/modeling/_utils.py", line 84, in make_quant
new_layer = QuantLinear(
File "/home/anon/miniconda3/envs/textgen/lib/python3.10/site-packages/auto_gptq/nn_modules/qlinear/qlinear_cuda_old.py", line 83, in __init__
self.autogptq_cuda = autogptq_cuda_256
NameError: name 'autogptq_cuda_256' is not defined
System Info
Intel I7-13700K CPU
64GB RAM
RTX 3090 Ti GPU (24 Gb VRAM)
G: 1TB SSD (Running WSL2 Ubuntu 20.4)
C: 500 GB SSD (System)
Running "nvcc --version":
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176
Running "nvidia-smi"
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.41.03 Driver Version: 531.41 CUDA Version: 12.1 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3090 Ti On | 00000000:01:00.0 On | Off |
| 0% 38C P8 14W / 450W| 1100MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 66 G /Xwayland N/A |
+---------------------------------------------------------------------------------------+
Getting the same deal but i'm trying to use CUDA with an NVidia card Doing the same on both CL Install and using the OneClick installer. Started completely from scratch multiple times, just short of re setting up WSL
(textgen) anon@OfficePC:~/text-generation-webui$ python server.py --listen --listen-port 8889 --api --chat Starting streaming server at ws://0.0.0.0:5005/api/v1/stream 2023-07-27 17:08:41 INFO:Loading the extension "gallery"... Starting API at http://0.0.0.0:5000/api Running on local URL: http://0.0.0.0:8889 To create a public link, set `share=True` in `launch()`. 2023-07-27 17:09:09 INFO:Loading TheBloke_MythoBoros-13B-GPTQ... 2023-07-27 17:09:09 INFO:The AutoGPTQ params are: {'model_basename': 'gptq_model-4bit-128g', 'device': 'cuda:0', 'use_triton': False, 'inject_fused_attention': True, 'inject_fused_mlp': True, 'use_safetensors': True, 'trust_remote_code': False, 'max_memory': None, 'quantize_config': None, 'use_cuda_fp16': True} 2023-07-27 17:09:09 WARNING:CUDA extension not installed. 2023-07-27 17:09:09 ERROR:Failed to load the model. Traceback (most recent call last): File "/home/anon/text-generation-webui/server.py", line 68, in load_model_wrapper shared.model, shared.tokenizer = load_model(shared.model_name, loader) File "/home/anon/text-generation-webui/modules/models.py", line 78, in load_model output = load_func_map[loader](model_name) File "/home/anon/text-generation-webui/modules/models.py", line 287, in AutoGPTQ_loader return modules.AutoGPTQ_loader.load_quantized(model_name) File "/home/anon/text-generation-webui/modules/AutoGPTQ_loader.py", line 56, in load_quantized model = AutoGPTQForCausalLM.from_quantized(path_to_model, **params) File "/home/anon/miniconda3/envs/textgen/lib/python3.10/site-packages/auto_gptq/modeling/auto.py", line 94, in from_quantized return quant_func( File "/home/anon/miniconda3/envs/textgen/lib/python3.10/site-packages/auto_gptq/modeling/_base.py", line 749, in from_quantized make_quant( File "/home/anon/miniconda3/envs/textgen/lib/python3.10/site-packages/auto_gptq/modeling/_utils.py", line 92, in make_quant make_quant( File "/home/anon/miniconda3/envs/textgen/lib/python3.10/site-packages/auto_gptq/modeling/_utils.py", line 92, in make_quant make_quant( File "/home/anon/miniconda3/envs/textgen/lib/python3.10/site-packages/auto_gptq/modeling/_utils.py", line 92, in make_quant make_quant( [Previous line repeated 1 more time] File "/home/anon/miniconda3/envs/textgen/lib/python3.10/site-packages/auto_gptq/modeling/_utils.py", line 84, in make_quant new_layer = QuantLinear( File "/home/anon/miniconda3/envs/textgen/lib/python3.10/site-packages/auto_gptq/nn_modules/qlinear/qlinear_cuda_old.py", line 83, in __init__ self.autogptq_cuda = autogptq_cuda_256 NameError: name 'autogptq_cuda_256' is not defined
System Info
Intel I7-13700K CPU 64GB RAM RTX 3090 Ti GPU (24 Gb VRAM) G: 1TB SSD (Running WSL2 Ubuntu 20.4) C: 500 GB SSD (System)
Running "nvcc --version":
nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2017 NVIDIA Corporation Built on Fri_Sep__1_21:08:03_CDT_2017 Cuda compilation tools, release 9.0, V9.0.176
Running "nvidia-smi"
+---------------------------------------------------------------------------------------+ | NVIDIA-SMI 530.41.03 Driver Version: 531.41 CUDA Version: 12.1 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce RTX 3090 Ti On | 00000000:01:00.0 On | Off | | 0% 38C P8 14W / 450W| 1100MiB / 24564MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 66 G /Xwayland N/A | +---------------------------------------------------------------------------------------+
I'm pretty sure it's the problem with CUDA or quant cuda compilation libraries or something like that and It's possible to make it work after investing a ton of time and installing different libraries but I just gave up on that and built my WEB UI with CPU mode only. Before I gave up I was able to set WEB UI to run without CUDA don't remember exactly how but it wasn't too hard but the speed was probably a bit slower without CUDA acceleration but my GPU was able to work just fine even with some good amount of layers being offloaded on it.
nvcc V9.0.176 < 11.7 I also had this problem myself. Win11+WSL2+Ubuntu20 04. It's about (I think) the version which Ubuntu 20.04 knows about. Even if I installed manually theb11.7, he still saw 9. Found somewhere how to sym link the new one. But I struggled a lot around this and decided to use docker starting from a plain simple Ubuntu 22.04 and putting all that it's needed in it. I've build my Dockerfile with cu da toolkit install in it and using one-click installer for Ooba. Now I don't have problems anymore 🙂. Back to you, take your time to read this (even it's old), it's a good starting point. https://askubuntu.com/questions/530043/removing-nvidia-cuda-toolkit-and-installing-new-one
This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.
Describe the bug
Whenever I try to load the model, error shows
Is there an existing issue for this?
Reproduction
amd is not supported, so I am using cpu mode
CMD_Flags= --chat --cpu
model: https://huggingface.co/mayaeary/pygmalion-6b_dev-4bit-128g
Screenshot
No response
Logs
System Info