Open lugangqi opened 5 months ago
M40 computing power 5.2,4060ti computing power 8.9, ExLlamav2 do not support 5.2 computing power of the device to ask developers to update
请开发者更新一下M40显卡的镜像内核拜托,真的非常需要,我不太会用github,如果大家看到麻烦帮我联系一下开发者,谢谢
Please update the M40 graphics card mirroring kernel please, really need, I do not know how to use github, if you see the trouble to help me contact the developer, thank you
Sorry for not responding to this right away. I have a lot on my plate, and that's also why I can't really prioritize support for Tesla GPUs. ExLlama relies on features that are not available in sm_52, and even if the code were changed to work around those limitations, the performance would be very poor, as I don't believe the M40 even has native FP16 support.
01:54:47-255686 INFO Starting Text generation web UI 01:54:47-260684 WARNING trust_remote_code is enabled. This is dangerous. 01:54:47-268684 INFO Loading the extension "openai" 01:54:47-469684 INFO OpenAI-compatible API URL:
Running on local URL: http://127.0.0.1:7860
01:55:11-441029 INFO Loading "14b-exl" 01:55:12-675028 ERROR Failed to load the model. Traceback (most recent call last): File "D:\text-generation-webui\modules\ui_model_menu.py", line 249, in load_model_wrapper shared.model, shared.tokenizer = load_model(selected_model, loader) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\text-generation-webui\modules\models.py", line 94, in load_model output = load_func_maploader ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\text-generation-webui\modules\models.py", line 366, in ExLlamav2_loader from modules.exllamav2 import Exllamav2Model File "D:\text-generation-webui\modules\exllamav2.py", line 5, in
from exllamav2 import (
File "D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2__init__.py", line 3, in
from exllamav2.model import ExLlamaV2
File "D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\model.py", line 25, in
from exllamav2.linear import ExLlamaV2Linear
File "D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\linear.py", line 7, in
from exllamav2.module import ExLlamaV2Module
File "D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\module.py", line 14, in
os.environ["CUDA_LAUNCH_BLOCKING"] = "1"
^^
NameError: name 'os' is not defined
01:55:54-858096 INFO Loading "14b-exl" 01:55:56-017617 ERROR Failed to load the model. Traceback (most recent call last): File "D:\text-generation-webui\modules\ui_model_menu.py", line 249, in load_model_wrapper shared.model, shared.tokenizer = load_model(selected_model, loader) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\text-generation-webui\modules\models.py", line 94, in load_model output = load_func_maploader ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\text-generation-webui\modules\models.py", line 368, in ExLlamav2_loader model, tokenizer = Exllamav2Model.from_pretrained(model_name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\text-generation-webui\modules\exllamav2.py", line 60, in from_pretrained model.load(split) File "D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\model.py", line 333, in load for item in f: x = item File "D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\model.py", line 356, in load_gen module.load() File "D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\attn.py", line 255, in load self.k_proj.load() File "D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\linear.py", line 92, in load if w is None: w = self.load_weight() ^^^^^^^^^^^^^^^^^^ File "D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\module.py", line 110, in load_weight qtensors = self.load_multi(key, ["q_weight", "q_invperm", "q_scale", "q_scale_max", "q_groups", "q_perm", "bias"]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\module.py", line 90, in load_multi tensors[k] = stfile.get_tensor(key + "." + k, device = self.device()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\text-generation-webui\installer_files\env\Lib\site-packages\exllamav2\fasttensors.py", line 204, in get_tensor tensor = f.get_tensor(key) ^^^^^^^^^^^^^^^^^ RuntimeError: CUDA error: no kernel image is available for execution on the device Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions.my CPU:2666v3 memory:DDR3 ECC 32G 1866hz GPU:4060ti 16g and M40 24g
I think I found out how to force it to support 5.2 GPU,cc_flag.append("-gencode") cc_flag.append("arch=compute_50,code=sm_50"), but I don't know where to add, hope the developer saw to help me solve this problem