Open SunMacArenas opened 2 weeks ago
Ollama docker image v1.4.7 works normal. GPU: Tesla V100-PCIE-32GB Nvidia Toolkit: V12.5
Ollama via latest docker image. Similar or same issue here:
/go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda/template-instances/../mmq.cuh:2422: ERROR: CUDA kernel mul_mat_q has no device code compatible with CUDA arch 700. ggml-cuda.cu was compCUDA error: unspecified launch failure
current device: 0, in function ggml_cuda_op_mul_mat at /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:1606
cudaGetLastError()
GGML_ASSERT: /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:100: !"CUDA error"
iled for: __CUDA_ARCH_LIST__
Recently I got the same error
ollama version is 0.2.7
It only core dumps when I try
ollama run deepseek-v2:236b
Error: llama runner process has terminated: signal: aborted (core dumped) CUDA error: CUBLAS_STATUS_NOT_INITIALIZED current device: 0, in function cublas_handle at /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda/common.cuh:826 cublasCreate_v2(&cublas_handles[device]) GGML_ASSERT: /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:100: !"CUDA error"
If I try a smaller model from the same vendor, no problem
ollama run deepseek-v2:16b
OS: Ubuntu 22.04 LTS GPU: Nvidia
# nvidia-smi
Sun Jul 21 07:39:28 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07 Driver Version: 550.90.07 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 3090 Off | 00000000:04:00.0 Off | N/A |
| 0% 42C P8 19W / 350W | 13MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA GeForce RTX 3090 Off | 00000000:84:00.0 Off | N/A |
| 0% 38C P8 22W / 350W | 13MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 2276 G /usr/lib/xorg/Xorg 4MiB |
| 1 N/A N/A 2276 G /usr/lib/xorg/Xorg 4MiB |
+-----------------------------------------------------------------------------------------+
It used to work a few weeks ago.
ollama run deepseek-v2:236b
@SunMacArenas can you share more information about your setup? I'm not able to reproduce the failure, and glm4 loads correctly for me in 0.2.1 and the latest 0.2.8. How much VRAM do you have? Can you share your server log?
@harrytong can you share the ollama ps
output from your system on the older version that worked, along with nvidia-smi
output when the model was loaded? How much system memory do you have? If you can share the server log on the older version that worked, and the newer version that fails to load that may also help understand what's going wrong.
Hi Daniel, Unfortunately I cannot bring back my old configuration. I don't know if it was CUDA 12.5.1 update, and/or Nvidia 555 driver. Right now the only way I can run ollama run deepseek-v2:236b is to unplug my two GTX 3090, and let my dual XEON 72 cores do the inference (much slower than when my 2 RTX 3090 can participate) I have a dual XEON CPU with 256GB RAM, dual RTX3090 (total 48GB GPU RAM). Here is my current nvidia-smi output @.:/home/harry# nvidia-smiTue Jul 23 20:38:10 2024 +-----------------------------------------------------------------------------------------+| NVIDIA-SMI 555.42.06 Driver Version: 555.42.06 CUDA Version: 12.5 ||-----------------------------------------+------------------------+----------------------+| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC || Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. || | | MIG M. ||=========================================+========================+======================|| 0 NVIDIA GeForce RTX 3090 Off | 00000000:04:00.0 Off | N/A || 0% 31C P8 8W / 350W | 18MiB / 24576MiB | 0% Default || | | N/A |+-----------------------------------------+------------------------+----------------------+| 1 NVIDIA GeForce RTX 3090 Off | 00000000:84:00.0 Off | N/A || 0% 32C P8 10W / 350W | 18MiB / 24576MiB | 0% Default || | | N/A |+-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+| Processes: || GPU GI CI PID Type Process name GPU Memory || ID ID Usage ||=========================================================================================|| 0 N/A N/A 2585 G /usr/lib/xorg/Xorg 4MiB || 1 N/A N/A 2585 G /usr/lib/xorg/Xorg 4MiB @.:/home/harry# here is my ollama version @.***:/home/harry# ollama -vollama version is 0.2.8 BTW, my current hardware and software configuration, it can run the meta llama 3.1:405b locally without issue. It can also run deepseek-v2:latest (16b) without issue. It only fails when it tries to run deepseek-v2:236b -Harry
On Tuesday, July 23, 2024 at 06:55:04 PM EDT, Daniel Hiltgen ***@***.***> wrote:
@SunMacArenas can you share more information about your setup? I'm not able to reproduce the failure, and glm4 loads correctly for me in 0.2.1 and the latest 0.2.8. How much VRAM do you have? Can you share your server log?
@harrytong can you share the ollama ps output from your system on the older version that worked, along with nvidia-smi output when the model was loaded? How much system memory do you have? If you can share the server log on the older version that worked, and the newer version that fails to load that may also help understand what's going wrong.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>
Hi Daniel,
Here are my nvidia-smi, ollama ps and server.log when I try to run following model and get the error.
@.***:~# ollama run deepseek-v2:236b Error: llama runner process has terminated: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED current device: 0, in function cublas_handle at /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda/common.cuh:826 cublasCreate_v2(&cublas_handles[device])GGML_ASSERT: /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:101: !"CUDA
Thanks, Harry
I am also uploading the files here. ollama.ps.txt ollama.list.txt ollama.server.log nvidia-smi.txt
What is the issue?
[root@hanadev system]# ollama run glm4 Error: llama runner process has terminated: signal: aborted (core dumped) CUDA error: CUBLAS_STATUS_NOT_INITIALIZED current device: 0, in function cublas_handle at /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda/common.cuh:826 cublasCreate_v2(&cublas_handles[device]) GGML_ASSERT: /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:100: !"CUDA error"
NVIDIA-SMI 465.19.01 Driver Version: 465.19.01 CUDA Version: 11.3
OS
Linux
GPU
Nvidia
CPU
Intel
Ollama version
0.21