ollama / ollama

Get up and running with Llama 3.1, Mistral, Gemma 2, and other large language models.
https://ollama.com
MIT License
82.64k stars 6.32k forks source link

ollama run glm4 error - `CUBLAS_STATUS_NOT_INITIALIZED` #5622

Open SunMacArenas opened 2 weeks ago

SunMacArenas commented 2 weeks ago

What is the issue?

[root@hanadev system]# ollama run glm4 Error: llama runner process has terminated: signal: aborted (core dumped) CUDA error: CUBLAS_STATUS_NOT_INITIALIZED current device: 0, in function cublas_handle at /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda/common.cuh:826 cublasCreate_v2(&cublas_handles[device]) GGML_ASSERT: /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:100: !"CUDA error"

NVIDIA-SMI 465.19.01 Driver Version: 465.19.01 CUDA Version: 11.3

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.21

linxOD commented 2 weeks ago

Ollama docker image v1.4.7 works normal. GPU: Tesla V100-PCIE-32GB Nvidia Toolkit: V12.5

Ollama via latest docker image. Similar or same issue here:

/go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda/template-instances/../mmq.cuh:2422: ERROR: CUDA kernel mul_mat_q has no device code compatible with CUDA arch 700. ggml-cuda.cu was compCUDA error: unspecified launch failure
  current device: 0, in function ggml_cuda_op_mul_mat at /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:1606
  cudaGetLastError()
GGML_ASSERT: /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:100: !"CUDA error"
iled for: __CUDA_ARCH_LIST__
harrytong commented 1 week ago

Recently I got the same error

ollama version is 0.2.7

It only core dumps when I try

ollama run deepseek-v2:236b

Error: llama runner process has terminated: signal: aborted (core dumped) CUDA error: CUBLAS_STATUS_NOT_INITIALIZED current device: 0, in function cublas_handle at /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda/common.cuh:826 cublasCreate_v2(&cublas_handles[device]) GGML_ASSERT: /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:100: !"CUDA error"

If I try a smaller model from the same vendor, no problem

ollama run deepseek-v2:16b

OS: Ubuntu 22.04 LTS GPU: Nvidia

# nvidia-smi
Sun Jul 21 07:39:28 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3090        Off |   00000000:04:00.0 Off |                  N/A |
|  0%   42C    P8             19W /  350W |      13MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce RTX 3090        Off |   00000000:84:00.0 Off |                  N/A |
|  0%   38C    P8             22W /  350W |      13MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      2276      G   /usr/lib/xorg/Xorg                              4MiB |
|    1   N/A  N/A      2276      G   /usr/lib/xorg/Xorg                              4MiB |
+-----------------------------------------------------------------------------------------+
harrytong commented 1 week ago

It used to work a few weeks ago.

ollama run deepseek-v2:236b

dhiltgen commented 1 week ago

@SunMacArenas can you share more information about your setup? I'm not able to reproduce the failure, and glm4 loads correctly for me in 0.2.1 and the latest 0.2.8. How much VRAM do you have? Can you share your server log?

@harrytong can you share the ollama ps output from your system on the older version that worked, along with nvidia-smi output when the model was loaded? How much system memory do you have? If you can share the server log on the older version that worked, and the newer version that fails to load that may also help understand what's going wrong.

harrytong commented 1 week ago

Hi Daniel, Unfortunately I cannot bring back my old configuration. I don't know if it was CUDA 12.5.1 update, and/or Nvidia 555 driver.  Right now the only way I can run ollama run deepseek-v2:236b is to unplug my two GTX 3090, and let my dual XEON 72 cores do the inference (much slower than when my 2 RTX 3090 can participate) I have a dual XEON CPU with 256GB RAM, dual RTX3090  (total 48GB GPU RAM). Here is my current nvidia-smi output @.:/home/harry# nvidia-smiTue Jul 23 20:38:10 2024       +-----------------------------------------------------------------------------------------+| NVIDIA-SMI 555.42.06              Driver Version: 555.42.06      CUDA Version: 12.5     ||-----------------------------------------+------------------------+----------------------+| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC || Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. ||                                         |                        |               MIG M. ||=========================================+========================+======================||   0  NVIDIA GeForce RTX 3090        Off |   00000000:04:00.0 Off |                  N/A ||  0%   31C    P8              8W /  350W |      18MiB /  24576MiB |      0%      Default ||                                         |                        |                  N/A |+-----------------------------------------+------------------------+----------------------+|   1  NVIDIA GeForce RTX 3090        Off |   00000000:84:00.0 Off |                  N/A ||  0%   32C    P8             10W /  350W |      18MiB /  24576MiB |      0%      Default ||                                         |                        |                  N/A |+-----------------------------------------+------------------------+----------------------+                                                                                         +-----------------------------------------------------------------------------------------+| Processes:                                                                              ||  GPU   GI   CI        PID   Type   Process name                              GPU Memory ||        ID   ID                                                               Usage      ||=========================================================================================||    0   N/A  N/A      2585      G   /usr/lib/xorg/Xorg                              4MiB ||    1   N/A  N/A      2585      G   /usr/lib/xorg/Xorg                              4MiB @.:/home/harry#  here is my ollama version  @.***:/home/harry# ollama -vollama version is 0.2.8 BTW, my current hardware and software configuration, it can run the meta llama 3.1:405b locally without issue. It can also run deepseek-v2:latest (16b) without issue. It only fails when it tries to run deepseek-v2:236b -Harry

On Tuesday, July 23, 2024 at 06:55:04 PM EDT, Daniel Hiltgen ***@***.***> wrote:  

@SunMacArenas can you share more information about your setup? I'm not able to reproduce the failure, and glm4 loads correctly for me in 0.2.1 and the latest 0.2.8. How much VRAM do you have? Can you share your server log?

@harrytong can you share the ollama ps output from your system on the older version that worked, along with nvidia-smi output when the model was loaded? How much system memory do you have? If you can share the server log on the older version that worked, and the newer version that fails to load that may also help understand what's going wrong.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

harrytong commented 6 days ago

Hi Daniel,

Here are my nvidia-smi, ollama ps and server.log when I try to run following model and get the error.

@.***:~# ollama run deepseek-v2:236b Error: llama runner process has terminated: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED  current device: 0, in function cublas_handle at /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda/common.cuh:826  cublasCreate_v2(&cublas_handles[device])GGML_ASSERT: /go/src/github.com/ollama/ollama/llm/llama.cpp/ggml/src/ggml-cuda.cu:101: !"CUDA

Thanks, Harry

harrytong commented 6 days ago

I am also uploading the files here. ollama.ps.txt ollama.list.txt ollama.server.log nvidia-smi.txt