oobabooga / text-generation-webui

A Gradio web UI for Large Language Models.
GNU Affero General Public License v3.0
40.57k stars 5.31k forks source link

CUDA error when trying to chat #4766

Closed Balthoraz closed 7 months ago

Balthoraz commented 11 months ago

Describe the bug

RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

It shows only when trying to chat with the GPT, instead of providing a response, this error is thrown and the chat collapses. For sure the graphic card doesn't have any problem, also I am able to use it without problem on other things (Stable Diffusion, etc). Only having problems on this one.

Is there an existing issue for this?

Reproduction

Load a model and try to chat with the GPT.

Screenshot

image image

Logs

Traceback (most recent call last):
  File "C:\Users\Leon\AppData\Roaming\StableDiffusion\text-generation-webui\modules\callbacks.py", line 57, in gentask
    ret = self.mfunc(callback=_callback, *args, **self.kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Leon\AppData\Roaming\StableDiffusion\text-generation-webui\modules\text_generation.py", line 355, in generate_with_callback
    shared.model.generate(**kwargs)
  File "C:\Users\Leon\AppData\Roaming\StableDiffusion\text-generation-webui\installer_files\env\Lib\site-packages\awq\models\base.py", line 41, in generate
    return self.model.generate(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Leon\AppData\Roaming\StableDiffusion\text-generation-webui\installer_files\env\Lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Leon\AppData\Roaming\StableDiffusion\text-generation-webui\installer_files\env\Lib\site-packages\transformers\generation\utils.py", line 1719, in generate
    return self.sample(
           ^^^^^^^^^^^^
  File "C:\Users\Leon\AppData\Roaming\StableDiffusion\text-generation-webui\installer_files\env\Lib\site-packages\transformers\generation\utils.py", line 2801, in sample
    outputs = self(
              ^^^^^
  File "C:\Users\Leon\AppData\Roaming\StableDiffusion\text-generation-webui\installer_files\env\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Leon\AppData\Roaming\StableDiffusion\text-generation-webui\installer_files\env\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Leon\AppData\Roaming\StableDiffusion\text-generation-webui\installer_files\env\Lib\site-packages\transformers\models\llama\modeling_llama.py", line 1034, in forward
    outputs = self.model(
              ^^^^^^^^^^^
  File "C:\Users\Leon\AppData\Roaming\StableDiffusion\text-generation-webui\installer_files\env\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Leon\AppData\Roaming\StableDiffusion\text-generation-webui\installer_files\env\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Leon\AppData\Roaming\StableDiffusion\text-generation-webui\installer_files\env\Lib\site-packages\transformers\models\llama\modeling_llama.py", line 922, in forward
    layer_outputs = decoder_layer(
                    ^^^^^^^^^^^^^^
  File "C:\Users\Leon\AppData\Roaming\StableDiffusion\text-generation-webui\installer_files\env\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Leon\AppData\Roaming\StableDiffusion\text-generation-webui\installer_files\env\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Leon\AppData\Roaming\StableDiffusion\text-generation-webui\installer_files\env\Lib\site-packages\transformers\models\llama\modeling_llama.py", line 672, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
                                                          ^^^^^^^^^^^^^^^
  File "C:\Users\Leon\AppData\Roaming\StableDiffusion\text-generation-webui\installer_files\env\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Leon\AppData\Roaming\StableDiffusion\text-generation-webui\installer_files\env\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Leon\AppData\Roaming\StableDiffusion\text-generation-webui\installer_files\env\Lib\site-packages\transformers\models\llama\modeling_llama.py", line 368, in forward
    value_states = self.v_proj(hidden_states)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Leon\AppData\Roaming\StableDiffusion\text-generation-webui\installer_files\env\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Leon\AppData\Roaming\StableDiffusion\text-generation-webui\installer_files\env\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Leon\AppData\Roaming\StableDiffusion\text-generation-webui\installer_files\env\Lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Leon\AppData\Roaming\StableDiffusion\text-generation-webui\installer_files\env\Lib\site-packages\awq\modules\linear.py", line 105, in forward
    out = awq_inference_engine.gemm_forward_cuda(x.reshape(-1, x.shape[-1]), self.qweight, self.scales, self.qzeros, 8)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

System Info

Microsoft Windows 10 Pro N (22H2)
Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz
64 GB RAM DDR3
GeForce GTX 1080 Ti
No565 commented 11 months ago

I think i have the same problem, i am using a GTX 1070 and i went with this installation :

1) conda create -n textgen python=3.11 2) conda activate textgen 3) pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 4) conda install -y -c "nvidia/label/cuda-12.1.0" cuda-runtime 5) git clone https://github.com/oobabooga/text-generation-webui 6) cd text-generation-webui 7) pip install -r requirements.txt (cpu has has AVX2)

I tried also with the cu118 instead with the cu121 but it just got worse.. i was not able to load any model... Any idea how to fix this ??? Thx in advance !

casper-hansen commented 11 months ago

This is most likely an environment issue when installing. What happens if you run the following:

pip install -v autoawq
No565 commented 11 months ago

I´ve got "Requirement already satisfied" everywhere ( : ...

casper-hansen commented 11 months ago

I meant to put a command that would install it, so maybe you could try this to reinstall:

pip install -v --upgrade --no-deps --force-reinstall autoawq
No565 commented 11 months ago

I did it with "pip install -v --upgrade --no-deps --force-reinstall autoawq" and no changes... I updated my anaconda as well but nothing :/ ...

After i write anything i got : ... ... ... RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Output generated in 0.63 seconds (0.00 tokens/s, 0 tokens, context 69, seed 879323278)

Same as above...

casper-hansen commented 11 months ago

Ahh, I see, you are using an unsupported graphics card probably. You need to be on Turing or later

No565 commented 11 months ago

Because i saw the "1 Task Done" label, i thought the bug was fixed, so i did an update using the " git pull " but after loading the model i still get this error when i try to chat :

RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Output generated in 0.96 seconds (0.00 tokens/s, 0 tokens, context 73, seed 263974982)

So nothing is changed for GTX 1070 :/

No565 commented 11 months ago

Update: I just figured out that If i load a GPTQ model, i can chat with it! Only the more adwanced modells with AWQ are not working, apparently...

jbeiter commented 10 months ago

I'm getting this error too. Is there any kind of load workaround for the GTX 1070? What is the problem, is my graphics card just too old?

antoineprobst commented 9 months ago

I have the same issue, running a RTX 3090 and a GTX 1080, tried on both Windows and WSL2.

github-actions[bot] commented 7 months ago

This issue has been closed due to inactivity for 2 months. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

lugangqi commented 4 months ago

My M40 24g runs ExLlama the same way, 4060ti 16g works fine under cuda12.4. M40 seems that the author did not update the kernel compatible with it, I also asked for help under the ExLlama2 author yesterday, I do not know whether the author to fix this compatibility problem, M40 and 980ti with the same architecture core computing power 5.2 to meet cuda12.4