oobabooga / text-generation-webui

A Gradio web UI for Large Language Models.
GNU Affero General Public License v3.0
40.91k stars 5.34k forks source link

RuntimeError: CUDA error: no kernel image is available for execution on the device Compile with to enable device-side assertions.TORCH_USE_CUDA_DSA #6183

Open lugangqi opened 5 months ago

lugangqi commented 5 months ago

Describe the bug

01:54:47-255686 INFO Starting Text generation web UI 01:54:47-260684 WARNING trust_remote_code is enabled. This is dangerous. 01:54:47-268684 INFO Loading the extension "openai" 01:54:47-469684 INFO OpenAI-compatible API URL:

                 http://127.0.0.1:5000

Running on local URL: http://127.0.0.1:7860/

01:55:11-441029 INFO Loading "14b-exl" 01:55:12-675028 ERROR Failed to load the model. Traceback (most recent call last): File "D:\text-generation-webui\modules\ui_model_menu.py", line 249, in load_model_wrapper shared.model, shared.tokenizer = load_model(selected_model, loader) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\text-generation-webui\modules\models.py", line 94, in load_model output = load_func_maploader ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\text-generation-webui\modules\models.py", line 366, in ExLlamav2_loader from modules.exllamav2 import Exllamav2Model File "D:\text-generation-webui\modules\exllamav2.py", line 5, in from exllamav2 import ( File "D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2init.py", line 3, in from exllamav2.model import ExLlamaV2 File "D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\model.py", line 25, in from exllamav2.linear import ExLlamaV2Linear File "D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\linear.py", line 7, in from exllamav2.module import ExLlamaV2Module File "D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\module.py", line 14, in os.environ["CUDA_LAUNCH_BLOCKING"] = "1" ^^ NameError: name 'os' is not defined

01:55:54-858096 INFO Loading "14b-exl" 01:55:56-017617 ERROR Failed to load the model. Traceback (most recent call last): File "D:\text-generation-webui\modules\ui_model_menu.py", line 249, in load_model_wrapper shared.model, shared.tokenizer = load_model(selected_model, loader) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\text-generation-webui\modules\models.py", line 94, in load_model output = load_func_maploader ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\text-generation-webui\modules\models.py", line 368, in ExLlamav2_loader model, tokenizer = Exllamav2Model.from_pretrained(model_name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\text-generation-webui\modules\exllamav2.py", line 60, in from_pretrained model.load(split) File "D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\model.py", line 333, in load for item in f: x = item File "D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\model.py", line 356, in load_gen module.load() File "D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\attn.py", line 255, in load self.k_proj.load() File "D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\linear.py", line 92, in load if w is None: w = self.load_weight() ^^^^^^^^^^^^^^^^^^ File "D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\module.py", line 110, in load_weight qtensors = self.load_multi(key, ["q_weight", "q_invperm", "q_scale", "q_scale_max", "q_groups", "q_perm", "bias"]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\module.py", line 90, in load_multi tensors[k] = stfile.get_tensor(key + "." + k, device = self.device()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\fasttensors.py", line 204, in get_tensor tensor = f.get_tensor(key) ^^^^^^^^^^^^^^^^^ RuntimeError: CUDA error: no kernel image is available for execution on the device Compile with to enable device-side assertions.TORCH_USE_CUDA_DSA

my CPU:2666v3 memory:DDR3 ECC 32G 1866hz GPU:4060ti 16g and M40 24g

I think I found out how to force it to support 5.2 GPU,cc_flag.append("-gencode") cc_flag.append("arch=compute_50,code=sm_50"), but I don't know where to add, hope the developer saw to help me solve this problem

Is there an existing issue for this?

Reproduction

But its computing power meets cuda 12.4, can't it be compatible

Screenshot

M40 computing power 5.2,4060ti computing power 8.9, ExLlamav2 do not support 5.2 computing power of the device to ask developers to update RuntimeError: CUDA error: no kernel image is available for execution on the device Compile with to enable device-side assertions.TORCH_USE_CUDA_DSA

Logs

01:54:47-255686 INFO Starting Text generation web UI
01:54:47-260684 WARNING trust_remote_code is enabled. This is dangerous.
01:54:47-268684 INFO Loading the extension "openai"
01:54:47-469684 INFO OpenAI-compatible API URL:

                     http://127.0.0.1:5000
Running on local URL: http://127.0.0.1:7860/

01:55:11-441029 INFO Loading "14b-exl"
01:55:12-675028 ERROR Failed to load the model.
Traceback (most recent call last):
File "D:\text-generation-webui\modules\ui_model_menu.py", line 249, in load_model_wrapper
shared.model, shared.tokenizer = load_model(selected_model, loader)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\text-generation-webui\modules\models.py", line 94, in load_model
output = load_func_map[loader](https://github.com/turboderp/exllamav2/issues/model_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\text-generation-webui\modules\models.py", line 366, in ExLlamav2_loader
from modules.exllamav2 import Exllamav2Model
File "D:\text-generation-webui\modules\exllamav2.py", line 5, in
from exllamav2 import (
File "D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2_init_.py", line 3, in
from exllamav2.model import ExLlamaV2
File "D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\model.py", line 25, in
from exllamav2.linear import ExLlamaV2Linear
File "D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\linear.py", line 7, in
from exllamav2.module import ExLlamaV2Module
File "D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\module.py", line 14, in
os.environ["CUDA_LAUNCH_BLOCKING"] = "1"
^^
NameError: name 'os' is not defined

01:55:54-858096 INFO Loading "14b-exl"
01:55:56-017617 ERROR Failed to load the model.
Traceback (most recent call last):
File "D:\text-generation-webui\modules\ui_model_menu.py", line 249, in load_model_wrapper
shared.model, shared.tokenizer = load_model(selected_model, loader)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\text-generation-webui\modules\models.py", line 94, in load_model
output = load_func_map[loader](https://github.com/turboderp/exllamav2/issues/model_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\text-generation-webui\modules\models.py", line 368, in ExLlamav2_loader
model, tokenizer = Exllamav2Model.from_pretrained(model_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\text-generation-webui\modules\exllamav2.py", line 60, in from_pretrained
model.load(split)
File "D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\model.py", line 333, in load
for item in f: x = item
File "D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\model.py", line 356, in load_gen
module.load()
File "D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\attn.py", line 255, in load
self.k_proj.load()
File "D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\linear.py", line 92, in load
if w is None: w = self.load_weight()
^^^^^^^^^^^^^^^^^^
File "D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\module.py", line 110, in load_weight
qtensors = self.load_multi(key, ["q_weight", "q_invperm", "q_scale", "q_scale_max", "q_groups", "q_perm", "bias"])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\module.py", line 90, in load_multi
tensors[k] = stfile.get_tensor(key + "." + k, device = self.device())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\fasttensors.py", line 204, in get_tensor
tensor = f.get_tensor(key)
^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: no kernel image is available for execution on the device
Compile with to enable device-side assertions.TORCH_USE_CUDA_DSA

my CPU:2666v3
memory:DDR3 ECC 32G 1866hz
GPU:4060ti 16g and M40 24g

I think I found out how to force it to support 5.2 GPU,cc_flag.append("-gencode")
cc_flag.append("arch=compute_50,code=sm_50"), but I don't know where to add, hope the developer saw to help me solve this problem

System Info

my CPU:2666v3
memory:DDR3 ECC 32G 1866hz
GPU:4060ti 16g and M40 24g

I think I found out how to force it to support 5.2 GPU,cc_flag.append("-gencode")
cc_flag.append("arch=compute_50,code=sm_50"), but I don't know where to add, hope the developer saw to help me solve this problem
Ph0rk0z commented 5 months ago

Exllama has no maxwell support.