Messages not working on Colab/Gradio

Clovisint commented 5 months ago

Describe the bug

As of today, no message is sent back by the AI. Settings are the default Colab/Gradio ones, I dont know how this computer beep-boop works.

Is there an existing issue for this?

[x] I have searched the existing issues

Reproduction

I ran Colab (https://colab.research.google.com/github/oobabooga/text-generation-webui/blob/main/Colab-TextGen-GPU.ipynb), that is all

Screenshot

No response

Logs

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)
Traceback (most recent call last):
  File "/content/text-generation-webui/modules/callbacks.py", line 61, in gentask
    ret = self.mfunc(callback=_callback, *args, **self.kwargs)
  File "/content/text-generation-webui/modules/text_generation.py", line 382, in generate_with_callback
    shared.model.generate(**kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1575, in generate
    result = self._sample(
  File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 2697, in _sample
    outputs = self(
  File "/content/text-generation-webui/modules/exllamav2_hf.py", line 136, in __call__
    self.ex_model.forward(seq_tensor[:-1].view(1, -1), ex_cache, preprocess_only=True, loras=self.loras)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/exllamav2/model.py", line 694, in forward
    r, ls = self._forward(input_ids = input_ids[:, chunk_begin : chunk_end],
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/exllamav2/model.py", line 776, in _forward
    x = module.forward(x, cache = cache, attn_params = attn_params, past_len = past_len, loras = loras, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/exllamav2/attn.py", line 575, in forward
    attn_weights = torch.matmul(q_states, k_states)
RuntimeError: CUDA error: CUBLAS_STATUS_INTERNAL_ERROR when calling `cublasGemmStridedBatchedEx( handle, opa, opb, m, n, k, (void*)(&falpha), a, CUDA_R_16F, lda, stridea, b, CUDA_R_16F, ldb, strideb, (void*)(&fbeta), c, CUDA_R_16F, ldc, stridec, num_batches, CUDA_R_32F, CUBLAS_GEMM_DEFAULT_TENSOR_OP)`
Exception in thread Thread-2 (gentask):
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/content/text-generation-webui/modules/callbacks.py", line 68, in gentask
    clear_torch_cache()
  File "/content/text-generation-webui/modules/callbacks.py", line 105, in clear_torch_cache
    torch.cuda.empty_cache()
  File "/usr/local/lib/python3.10/dist-packages/torch/cuda/memory.py", line 162, in empty_cache
    torch._C._cuda_emptyCache()
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

System Info

Colab, Chrome

Noah-Iovanni-Lorenzo-Diaz commented 5 months ago

i have the exact same problem, i was about to start the colab as i always do, with the default options, and it just doesnt generate the text, my error message is the same as yours i think.


python server.py --share --model TheBloke_MythoMax-L2-13B-GPTQ_gptq-4bit-32g-actorder_True --n-gpu-layers 128 --load-in-4bit --use_double_quant --api --public-api
23:29:57-012717 INFO     Starting Text generation web UI                                            
23:29:57-027561 INFO     Loading "TheBloke_MythoMax-L2-13B-GPTQ_gptq-4bit-32g-actorder_True"        
23:30:26-520518 INFO     LOADER: "ExLlamav2_HF"                                                     
23:30:26-522361 INFO     TRUNCATION LENGTH: 4096                                                    
23:30:26-523533 INFO     INSTRUCTION TEMPLATE: "Alpaca"                                             
23:30:26-524603 INFO     Loaded the model in 29.50 seconds.                                         
23:30:26-525713 INFO     Loading the extension "openai"                                             
 * Downloading cloudflared for Linux x86_64...

Running on local URL:  http://127.0.0.1:7860/

23:30:32-249686 INFO     OpenAI-compatible API URL:                                                 

                         https://len-injured-enjoyed-professional.trycloudflare.com/                 

INFO:     88.8.109.167:0 - "GET / HTTP/1.1" 405 Method Not Allowed
INFO:     88.8.109.167:0 - "GET /favicon.ico HTTP/1.1" 404 Not Found
Running on public URL: https://f05d33e71ee4796b45.gradio.live/

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)
INFO:     37.10.129.208:0 - "GET / HTTP/1.1" 405 Method Not Allowed
INFO:     37.10.129.208:0 - "GET /v1/models HTTP/1.1" 200 OK
INFO:     37.10.129.208:0 - "GET /v1/internal/model/info HTTP/1.1" 200 OK
Traceback (most recent call last):
  File "/content/text-generation-webui/modules/callbacks.py", line 61, in gentask
    ret = self.mfunc(callback=_callback, *args, **self.kwargs)
  File "/content/text-generation-webui/modules/text_generation.py", line 382, in generate_with_callback
    shared.model.generate(**kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1575, in generate
    result = self._sample(
  File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 2697, in _sample
    outputs = self(
  File "/content/text-generation-webui/modules/exllamav2_hf.py", line 136, in __call__
    self.ex_model.forward(seq_tensor[:-1].view(1, -1), ex_cache, preprocess_only=True, loras=self.loras)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/exllamav2/model.py", line 694, in forward
    r, ls = self._forward(input_ids = input_ids[:, chunk_begin : chunk_end],
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/exllamav2/model.py", line 776, in _forward
    x = module.forward(x, cache = cache, attn_params = attn_params, past_len = past_len, loras = loras, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/exllamav2/attn.py", line 575, in forward
    attn_weights = torch.matmul(q_states, k_states)
RuntimeError: CUDA error: CUBLAS_STATUS_INTERNAL_ERROR when calling `cublasGemmStridedBatchedEx( handle, opa, opb, m, n, k, (void*)(&falpha), a, CUDA_R_16F, lda, stridea, b, CUDA_R_16F, ldb, strideb, (void*)(&fbeta), c, CUDA_R_16F, ldc, stridec, num_batches, CUDA_R_32F, CUBLAS_GEMM_DEFAULT_TENSOR_OP)`
Exception in thread Thread-3 (gentask):
Traceback (most recent call last):
  File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/content/text-generation-webui/modules/callbacks.py", line 68, in gentask
    clear_torch_cache()
  File "/content/text-generation-webui/modules/callbacks.py", line 105, in clear_torch_cache
    torch.cuda.empty_cache()
  File "/usr/local/lib/python3.10/dist-packages/torch/cuda/memory.py", line 162, in empty_cache
    torch._C._cuda_emptyCache()
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Noah-Iovanni-Lorenzo-Diaz commented 5 months ago

if you want a quick fix, you can tell the program to clone the previous version of the webui, which is still working, you can do so by going to the files section, and changing the text-generation-webui folder's name to anything else (so it downloads the previous version, and since you cant delete folders that arent empty) then go to the code of the notebook by clicking twice in the second box, and changing the line that says

!git clone https://github.com/oobabooga/text-generation-webui to !git clone https://github.com/oobabooga/text-generation-webui --branch "snapshot-2024-03-31"

github-actions[bot] commented 3 months ago

This issue has been closed due to inactivity for 2 months. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

oobabooga / text-generation-webui