Closed Balthoraz closed 7 months ago
I think i have the same problem, i am using a GTX 1070 and i went with this installation :
1) conda create -n textgen python=3.11 2) conda activate textgen 3) pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 4) conda install -y -c "nvidia/label/cuda-12.1.0" cuda-runtime 5) git clone https://github.com/oobabooga/text-generation-webui 6) cd text-generation-webui 7) pip install -r requirements.txt (cpu has has AVX2)
I tried also with the cu118 instead with the cu121 but it just got worse.. i was not able to load any model... Any idea how to fix this ??? Thx in advance !
This is most likely an environment issue when installing. What happens if you run the following:
pip install -v autoawq
I´ve got "Requirement already satisfied" everywhere ( : ...
I meant to put a command that would install it, so maybe you could try this to reinstall:
pip install -v --upgrade --no-deps --force-reinstall autoawq
I did it with "pip install -v --upgrade --no-deps --force-reinstall autoawq" and no changes... I updated my anaconda as well but nothing :/ ...
After i write anything i got :
...
...
...
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
Output generated in 0.63 seconds (0.00 tokens/s, 0 tokens, context 69, seed 879323278)
Same as above...
Ahh, I see, you are using an unsupported graphics card probably. You need to be on Turing or later
Because i saw the "1 Task Done" label, i thought the bug was fixed, so i did an update using the " git pull " but after loading the model i still get this error when i try to chat :
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
Output generated in 0.96 seconds (0.00 tokens/s, 0 tokens, context 73, seed 263974982)
So nothing is changed for GTX 1070 :/
Update: I just figured out that If i load a GPTQ model, i can chat with it! Only the more adwanced modells with AWQ are not working, apparently...
I'm getting this error too. Is there any kind of load workaround for the GTX 1070? What is the problem, is my graphics card just too old?
I have the same issue, running a RTX 3090 and a GTX 1080, tried on both Windows and WSL2.
This issue has been closed due to inactivity for 2 months. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.
My M40 24g runs ExLlama the same way, 4060ti 16g works fine under cuda12.4. M40 seems that the author did not update the kernel compatible with it, I also asked for help under the ExLlama2 author yesterday, I do not know whether the author to fix this compatibility problem, M40 and 980ti with the same architecture core computing power 5.2 to meet cuda12.4
Describe the bug
RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions.It shows only when trying to chat with the GPT, instead of providing a response, this error is thrown and the chat collapses. For sure the graphic card doesn't have any problem, also I am able to use it without problem on other things (Stable Diffusion, etc). Only having problems on this one.
Is there an existing issue for this?
Reproduction
Load a model and try to chat with the GPT.
Screenshot
Logs
System Info