Closed Clovisint closed 3 months ago
i have the exact same problem, i was about to start the colab as i always do, with the default options, and it just doesnt generate the text, my error message is the same as yours i think.
python server.py --share --model TheBloke_MythoMax-L2-13B-GPTQ_gptq-4bit-32g-actorder_True --n-gpu-layers 128 --load-in-4bit --use_double_quant --api --public-api
23:29:57-012717 INFO Starting Text generation web UI
23:29:57-027561 INFO Loading "TheBloke_MythoMax-L2-13B-GPTQ_gptq-4bit-32g-actorder_True"
23:30:26-520518 INFO LOADER: "ExLlamav2_HF"
23:30:26-522361 INFO TRUNCATION LENGTH: 4096
23:30:26-523533 INFO INSTRUCTION TEMPLATE: "Alpaca"
23:30:26-524603 INFO Loaded the model in 29.50 seconds.
23:30:26-525713 INFO Loading the extension "openai"
* Downloading cloudflared for Linux x86_64...
Running on local URL: http://127.0.0.1:7860/
23:30:32-249686 INFO OpenAI-compatible API URL:
https://len-injured-enjoyed-professional.trycloudflare.com/
INFO: 88.8.109.167:0 - "GET / HTTP/1.1" 405 Method Not Allowed
INFO: 88.8.109.167:0 - "GET /favicon.ico HTTP/1.1" 404 Not Found
Running on public URL: https://f05d33e71ee4796b45.gradio.live/
This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)
INFO: 37.10.129.208:0 - "GET / HTTP/1.1" 405 Method Not Allowed
INFO: 37.10.129.208:0 - "GET /v1/models HTTP/1.1" 200 OK
INFO: 37.10.129.208:0 - "GET /v1/internal/model/info HTTP/1.1" 200 OK
Traceback (most recent call last):
File "/content/text-generation-webui/modules/callbacks.py", line 61, in gentask
ret = self.mfunc(callback=_callback, *args, **self.kwargs)
File "/content/text-generation-webui/modules/text_generation.py", line 382, in generate_with_callback
shared.model.generate(**kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 1575, in generate
result = self._sample(
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 2697, in _sample
outputs = self(
File "/content/text-generation-webui/modules/exllamav2_hf.py", line 136, in __call__
self.ex_model.forward(seq_tensor[:-1].view(1, -1), ex_cache, preprocess_only=True, loras=self.loras)
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/exllamav2/model.py", line 694, in forward
r, ls = self._forward(input_ids = input_ids[:, chunk_begin : chunk_end],
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/exllamav2/model.py", line 776, in _forward
x = module.forward(x, cache = cache, attn_params = attn_params, past_len = past_len, loras = loras, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/exllamav2/attn.py", line 575, in forward
attn_weights = torch.matmul(q_states, k_states)
RuntimeError: CUDA error: CUBLAS_STATUS_INTERNAL_ERROR when calling `cublasGemmStridedBatchedEx( handle, opa, opb, m, n, k, (void*)(&falpha), a, CUDA_R_16F, lda, stridea, b, CUDA_R_16F, ldb, strideb, (void*)(&fbeta), c, CUDA_R_16F, ldc, stridec, num_batches, CUDA_R_32F, CUBLAS_GEMM_DEFAULT_TENSOR_OP)`
Exception in thread Thread-3 (gentask):
Traceback (most recent call last):
File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/usr/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/content/text-generation-webui/modules/callbacks.py", line 68, in gentask
clear_torch_cache()
File "/content/text-generation-webui/modules/callbacks.py", line 105, in clear_torch_cache
torch.cuda.empty_cache()
File "/usr/local/lib/python3.10/dist-packages/torch/cuda/memory.py", line 162, in empty_cache
torch._C._cuda_emptyCache()
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
if you want a quick fix, you can tell the program to clone the previous version of the webui, which is still working, you can do so by going to the files section, and changing the text-generation-webui folder's name to anything else (so it downloads the previous version, and since you cant delete folders that arent empty) then go to the code of the notebook by clicking twice in the second box, and changing the line that says
!git clone https://github.com/oobabooga/text-generation-webui to !git clone https://github.com/oobabooga/text-generation-webui --branch "snapshot-2024-03-31"
This issue has been closed due to inactivity for 2 months. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.
Describe the bug
As of today, no message is sent back by the AI. Settings are the default Colab/Gradio ones, I dont know how this computer beep-boop works.
Is there an existing issue for this?
Reproduction
I ran Colab (https://colab.research.google.com/github/oobabooga/text-generation-webui/blob/main/Colab-TextGen-GPU.ipynb), that is all
Screenshot
No response
Logs
System Info