Open UlvenDagoth opened 7 months ago
You would have to compile it yourself. Exllama isn't compiled for pascal, nor is performance on that card any good.
So I'd be better off using a different model loader?
Yep, pretty much.
i have the same issue running this model TheBloke_dolphin-2.6-mistral-7B-AWQ , i compared the conda virtual environment with my alltalk_tts (pip list) what i found there is no cuda installed, i used the one click installer (start_windows.bat ) I also saw in the releases there was a update to torch 2.2 but in my pip list its at version 2.1
Having the same issue trying to run a llama 3 70B Model:
128GB of RAM 48GB of VRAM
Error I get is:
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
My M40 24g runs ExLlama the same way, 4060ti 16g works fine under cuda12.4. M40 seems that the author did not update the kernel compatible with it, I also asked for help under the ExLlama2 author yesterday, I do not know whether the author to fix this compatibility problem, M40 and 980ti with the same architecture core computing power 5.2 to meet cuda12.4
My M40 24g runs ExLlama the same way, 4060ti 16g works fine under cuda12.4. M40 seems that the author did not update the kernel compatible with it, I also asked for help under the ExLlama2 author yesterday, I do not know whether the author to fix this compatibility problem, M40 and 980ti with the same architecture core computing power 5.2 to meet cuda12.4
You would have to compile it yourself. Exllama isn't compiled for pascal, nor is performance on that card any good.
*Maxwell
But I don't know how to compile, can you tell me?
Describe the bug
I Run into this error every time I try to load a model in any of the ExLlama loaders
Traceback (most recent call last): File "C:\Users\Burrf\Desktop\text-generation-webui-main\modules\ui_model_menu.py", line 245, in load_model_wrapper shared.model, shared.tokenizer = load_model(selected_model, loader) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Burrf\Desktop\text-generation-webui-main\modules\models.py", line 87, in load_model output = load_func_maploader ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Burrf\Desktop\text-generation-webui-main\modules\models.py", line 380, in ExLlamav2_HF_loader return Exllamav2HF.from_pretrained(model_name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Burrf\Desktop\text-generation-webui-main\modules\exllamav2_hf.py", line 181, in from_pretrained return Exllamav2HF(config) ^^^^^^^^^^^^^^^^^^^ File "C:\Users\Burrf\Desktop\text-generation-webui-main\modules\exllamav2_hf.py", line 50, in init self.ex_model.load(split) File "C:\Users\Burrf\Desktop\text-generation-webui-main\installer_files\env\Lib\site-packages\exllamav2\model.py", line 266, in load for item in f: x = item File "C:\Users\Burrf\Desktop\text-generation-webui-main\installer_files\env\Lib\site-packages\exllamav2\model.py", line 284, in load_gen module.load() File "C:\Users\Burrf\Desktop\text-generation-webui-main\installer_files\env\Lib\site-packages\exllamav2\attn.py", line 191, in load self.v_proj.load() File "C:\Users\Burrf\Desktop\text-generation-webui-main\installer_files\env\Lib\site-packages\exllamav2\linear.py", line 45, in load if w is None: w = self.load_weight() ^^^^^^^^^^^^^^^^^^ File "C:\Users\Burrf\Desktop\text-generation-webui-main\installer_files\env\Lib\site-packages\exllamav2\module.py", line 111, in load_weight qtensors = self.load_multi(key, ["qweight", "qzeros", "scales", "g_idx", "bias"], override_key = override_key) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Burrf\Desktop\text-generation-webui-main\installer_files\env\Lib\site-packages\exllamav2\module.py", line 78, in load_multi tensors[k] = stfile.get_tensor(key + "." + k, device = self.device()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Burrf\Desktop\text-generation-webui-main\installer_files\env\Lib\site-packages\exllamav2\fasttensors.py", line 118, in get_tensor return f.get_tensor(key) ^^^^^^^^^^^^^^^^^ RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions.I have no idea how to fix this as I have no coding knowledge, and I've looked EVERYWHERE for a fix. Please help me with this, as I'm at my wits end here.
Is there an existing issue for this?
Reproduction
Any time I load a model in ExLlamaV2_HF or ExLlamaV2
Screenshot
Logs
System Info