turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.2k stars 235 forks source link

help me,error #524

Open lugangqi opened 3 days ago

lugangqi commented 3 days ago

01:54:47-255686 INFO Starting Text generation web UI 01:54:47-260684 WARNING trust_remote_code is enabled. This is dangerous. 01:54:47-268684 INFO Loading the extension "openai" 01:54:47-469684 INFO OpenAI-compatible API URL:

                     http://127.0.0.1:5000

Running on local URL: http://127.0.0.1:7860

01:55:11-441029 INFO Loading "14b-exl" 01:55:12-675028 ERROR Failed to load the model. Traceback (most recent call last): File "D:\text-generation-webui\modules\ui_model_menu.py", line 249, in load_model_wrapper shared.model, shared.tokenizer = load_model(selected_model, loader) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\text-generation-webui\modules\models.py", line 94, in load_model output = load_func_maploader ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\text-generation-webui\modules\models.py", line 366, in ExLlamav2_loader from modules.exllamav2 import Exllamav2Model File "D:\text-generation-webui\modules\exllamav2.py", line 5, in from exllamav2 import ( File "D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2__init__.py", line 3, in from exllamav2.model import ExLlamaV2 File "D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\model.py", line 25, in from exllamav2.linear import ExLlamaV2Linear File "D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\linear.py", line 7, in from exllamav2.module import ExLlamaV2Module File "D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\module.py", line 14, in os.environ["CUDA_LAUNCH_BLOCKING"] = "1" ^^ NameError: name 'os' is not defined

01:55:54-858096 INFO Loading "14b-exl" 01:55:56-017617 ERROR Failed to load the model. Traceback (most recent call last): File "D:\text-generation-webui\modules\ui_model_menu.py", line 249, in load_model_wrapper shared.model, shared.tokenizer = load_model(selected_model, loader) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\text-generation-webui\modules\models.py", line 94, in load_model output = load_func_maploader ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\text-generation-webui\modules\models.py", line 368, in ExLlamav2_loader model, tokenizer = Exllamav2Model.from_pretrained(model_name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\text-generation-webui\modules\exllamav2.py", line 60, in from_pretrained model.load(split) File "D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\model.py", line 333, in load for item in f: x = item File "D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\model.py", line 356, in load_gen module.load() File "D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\attn.py", line 255, in load self.k_proj.load() File "D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\linear.py", line 92, in load if w is None: w = self.load_weight() ^^^^^^^^^^^^^^^^^^ File "D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\module.py", line 110, in load_weight qtensors = self.load_multi(key, ["q_weight", "q_invperm", "q_scale", "q_scale_max", "q_groups", "q_perm", "bias"]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\module.py", line 90, in load_multi tensors[k] = stfile.get_tensor(key + "." + k, device = self.device()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\fasttensors.py", line 204, in get_tensor tensor = f.get_tensor(key) ^^^^^^^^^^^^^^^^^ RuntimeError: CUDA error: no kernel image is available for execution on the device Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

my CPU:2666v3 memory:DDR3 ECC 32G 1866hz GPU:4060ti 16g and M40 24g

I think I found out how to force it to support 5.2 GPU,cc_flag.append("-gencode") cc_flag.append("arch=compute_50,code=sm_50"), but I don't know where to add, hope the developer saw to help me solve this problem

lugangqi commented 3 days ago

M40 computing power 5.2,4060ti computing power 8.9, ExLlamav2 do not support 5.2 computing power of the device to ask developers to update

lugangqi commented 3 days ago

请开发者更新一下M40显卡的镜像内核拜托,真的非常需要,我不太会用github,如果大家看到麻烦帮我联系一下开发者,谢谢

lugangqi commented 3 days ago

Please update the M40 graphics card mirroring kernel please, really need, I do not know how to use github, if you see the trouble to help me contact the developer, thank you