ollama run glm4 error - `CUBLAS_STATUS_NOT_INITIALIZED`

SunMacArenas commented 2 weeks ago

What is the issue?

[root@hanadev system]# ollama run glm4 Error: llama runner process has terminated: signal: aborted (core dumped) CUDA error: CUBLAS_STATUS_NOT_INITIALIZED current device: 0, in function cublas_handle at /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda/common.cuh:826 cublasCreate_v2(&cublas_handles[device]) GGML_ASSERT: /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:100: !"CUDA error"

NVIDIA-SMI 465.19.01 Driver Version: 465.19.01 CUDA Version: 11.3

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.21

linxOD commented 2 weeks ago

Ollama docker image v1.4.7 works normal. GPU: Tesla V100-PCIE-32GB Nvidia Toolkit: V12.5

Ollama via latest docker image. Similar or same issue here:

/go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda/template-instances/../mmq.cuh:2422: ERROR: CUDA kernel mul_mat_q has no device code compatible with CUDA arch 700. ggml-cuda.cu was compCUDA error: unspecified launch failure
  current device: 0, in function ggml_cuda_op_mul_mat at /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:1606
  cudaGetLastError()
GGML_ASSERT: /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:100: !"CUDA error"
iled for: __CUDA_ARCH_LIST__

harrytong commented 1 week ago

Recently I got the same error

ollama version is 0.2.7

It only core dumps when I try

ollama run deepseek-v2:236b

Error: llama runner process has terminated: signal: aborted (core dumped) CUDA error: CUBLAS_STATUS_NOT_INITIALIZED current device: 0, in function cublas_handle at /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda/common.cuh:826 cublasCreate_v2(&cublas_handles[device]) GGML_ASSERT: /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:100: !"CUDA error"

If I try a smaller model from the same vendor, no problem

ollama run deepseek-v2:16b

OS: Ubuntu 22.04 LTS GPU: Nvidia

# nvidia-smi
Sun Jul 21 07:39:28 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3090        Off |   00000000:04:00.0 Off |                  N/A |
|  0%   42C    P8             19W /  350W |      13MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce RTX 3090        Off |   00000000:84:00.0 Off |                  N/A |
|  0%   38C    P8             22W /  350W |      13MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      2276      G   /usr/lib/xorg/Xorg                              4MiB |
|    1   N/A  N/A      2276      G   /usr/lib/xorg/Xorg                              4MiB |
+-----------------------------------------------------------------------------------------+

harrytong commented 1 week ago

It used to work a few weeks ago.

ollama run deepseek-v2:236b

dhiltgen commented 1 week ago

@SunMacArenas can you share more information about your setup? I'm not able to reproduce the failure, and glm4 loads correctly for me in 0.2.1 and the latest 0.2.8. How much VRAM do you have? Can you share your server log?

@harrytong can you share the ollama ps output from your system on the older version that worked, along with nvidia-smi output when the model was loaded? How much system memory do you have? If you can share the server log on the older version that worked, and the newer version that fails to load that may also help understand what's going wrong.

harrytong commented 1 week ago

Hi Daniel, Unfortunately I cannot bring back my old configuration. I don't know if it was CUDA 12.5.1 update, and/or Nvidia 555 driver. Right now the only way I can run ollama run deepseek-v2:236b is to unplug my two GTX 3090, and let my dual XEON 72 cores do the inference (much slower than when my 2 RTX 3090 can participate) I have a dual XEON CPU with 256GB RAM, dual RTX3090 (total 48GB GPU RAM). Here is my current nvidia-smi output @.:/home/harry# nvidia-smiTue Jul 23 20:38:10 2024 +-----------------------------------------------------------------------------------------+| NVIDIA-SMI 555.42.06 Driver Version: 555.42.06 CUDA Version: 12.5 ||-----------------------------------------+------------------------+----------------------+| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC || Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. || | | MIG M. ||=========================================+========================+======================|| 0 NVIDIA GeForce RTX 3090 Off | 00000000:04:00.0 Off | N/A || 0% 31C P8 8W / 350W | 18MiB / 24576MiB | 0% Default || | | N/A |+-----------------------------------------+------------------------+----------------------+| 1 NVIDIA GeForce RTX 3090 Off | 00000000:84:00.0 Off | N/A || 0% 32C P8 10W / 350W | 18MiB / 24576MiB | 0% Default || | | N/A |+-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+| Processes: || GPU GI CI PID Type Process name GPU Memory || ID ID Usage ||=========================================================================================|| 0 N/A N/A 2585 G /usr/lib/xorg/Xorg 4MiB || 1 N/A N/A 2585 G /usr/lib/xorg/Xorg 4MiB @.:/home/harry# here is my ollama version @.***:/home/harry# ollama -vollama version is 0.2.8 BTW, my current hardware and software configuration, it can run the meta llama 3.1:405b locally without issue. It can also run deepseek-v2:latest (16b) without issue. It only fails when it tries to run deepseek-v2:236b -Harry

On Tuesday, July 23, 2024 at 06:55:04 PM EDT, Daniel Hiltgen ***@***.***> wrote:

@SunMacArenas can you share more information about your setup? I'm not able to reproduce the failure, and glm4 loads correctly for me in 0.2.1 and the latest 0.2.8. How much VRAM do you have? Can you share your server log?

@harrytong can you share the ollama ps output from your system on the older version that worked, along with nvidia-smi output when the model was loaded? How much system memory do you have? If you can share the server log on the older version that worked, and the newer version that fails to load that may also help understand what's going wrong.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

harrytong commented 6 days ago

Hi Daniel,

Here are my nvidia-smi, ollama ps and server.log when I try to run following model and get the error.

@.***:~# ollama run deepseek-v2:236b Error: llama runner process has terminated: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED current device: 0, in function cublas_handle at /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda/common.cuh:826 cublasCreate_v2(&cublas_handles[device])GGML_ASSERT: /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:101: !"CUDA

Thanks, Harry

harrytong commented 6 days ago

I am also uploading the files here. ollama.ps.txt ollama.list.txt ollama.server.log nvidia-smi.txt

ollama / ollama

ollama run glm4 error - `CUBLAS_STATUS_NOT_INITIALIZED` #5622

What is the issue?

OS

GPU

CPU

Ollama version