Integrated GPU support - Githubissues

DocMAX commented 3 months ago

Opening a new issue (see https://github.com/ollama/ollama/pull/2195) to track support for integrated GPUs. I have a AMD 5800U CPU with integrated graphics. As far as i did research ROCR lately does support integrated graphics too.

Currently Ollama seems to ignore iGPUs in general.

GZGavinZhao commented 3 months ago

ROCm's support for integrated GPUs is not that well. This issue may largely depend on AMD's progress on improving ROCm.

DocMAX commented 3 months ago

OK, but i would like to have an option to have it enable. Just to check if it works.

DocMAX commented 3 months ago

This is what i get with the new docker image (rocm support). Detects Radeon and then says no GPU detected?!?

GZGavinZhao commented 3 months ago

Their AMDDetected() function is a bit broken and I haven't figured out a fix for it.

sid-cypher commented 3 months ago

I've seen this behavior in #2411, but only with the version from ollama.com. Try it with the latest released binary? https://github.com/ollama/ollama/releases/tag/v0.1.27

GZGavinZhao commented 3 months ago

Yes, latest release fixed this behavior.

DocMAX commented 3 months ago

I had a permission issue with lxc/docker. Now:

time=2024-02-23T19:27:29.715Z level=INFO source=images.go:710 msg="total blobs: 31"
time=2024-02-23T19:27:29.716Z level=INFO source=images.go:717 msg="total unused blobs removed: 0"
time=2024-02-23T19:27:29.717Z level=INFO source=routes.go:1019 msg="Listening on [::]:11434 (version 0.1.27)"
time=2024-02-23T19:27:29.717Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..."
time=2024-02-23T19:27:33.385Z level=INFO source=payload_common.go:146 msg="Dynamic LLM libraries [cpu_avx rocm_v6 rocm_v5 cuda_v11 cpu_avx2]"
time=2024-02-23T19:27:33.385Z level=INFO source=gpu.go:94 msg="Detecting GPU type"
time=2024-02-23T19:27:33.385Z level=INFO source=gpu.go:265 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-02-23T19:27:33.387Z level=INFO source=gpu.go:311 msg="Discovered GPU libraries: []"
time=2024-02-23T19:27:33.387Z level=INFO source=gpu.go:265 msg="Searching for GPU management library librocm_smi64.so"
time=2024-02-23T19:27:33.388Z level=INFO source=gpu.go:311 msg="Discovered GPU libraries: [/opt/rocm/lib/librocm_smi64.so.5.0.50701 /opt/rocm-5.7.1/lib/librocm_smi64.so.5.0.50701]"
time=2024-02-23T19:27:33.391Z level=INFO source=gpu.go:109 msg="Radeon GPU detected"
time=2024-02-23T19:27:33.391Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-02-23T19:27:33.391Z level=INFO source=gpu.go:181 msg="ROCm unsupported integrated GPU detected"
time=2024-02-23T19:27:33.392Z level=INFO source=routes.go:1042 msg="no GPU detected"

So as the topic says, please add integrated GPU support (AMD 5800U here)

robertvazan commented 3 months ago

Latest (0.1.27) docker image with ROCm works for me on Ryzen 5600G with 8GB VRAM allocation. Prompt processing is 2x faster than with CPU. Generation runs at max speed even if CPU is busy running other processes. I am on Fedora 39.

Container setup:

HSA_OVERRIDE_GFX_VERSION=9.0.0
~~HCC_AMDGPU_TARGETS=gfx900~~ (unnecessary)
share devices: ~~/dev/dri/card1, /dev/dri/renderD128~~, /dev/dri, /dev/kfd
~~additional options: --group-add video --security-opt seccomp:unconfined~~ (unnecessary)

It's however still shaky:

With topk1, output should be fully reproducible, but first iGPU generation differs from the following ones for the same prompt. Both first and following iGPU generations differ from what CPU produces. Differences are minor though.
Output is sometimes garbage on iGPU as if the prompt is ignored. Restarting ollama fixes the problem.
Ollama often fails to offload all layers to the iGPU when switching models, reporting low VRAM as if parts of the previous model are still in VRAM. Restarting ollama fixes the problem for a while.
Partial offload with 13B model works, but mixtral is broken. It just hangs.

robertvazan commented 3 months ago

See also discussion in the #738 epic.

DocMAX commented 3 months ago

Why does it work for you?? Still not working here.

services:
  ollama:
    #image: ollama/ollama:latest
    image: ollama/ollama:0.1.27-rocm
    container_name: ollama
    volumes:
      - data:/root/.ollama
    restart: unless-stopped
    devices:
      - /dev/dri
      - /dev/kfd
    security_opt:
      - "seccomp:unconfined"
    group_add:
      - video
    environment:
      - 'HSA_OVERRIDE_GFX_VERSION=9.0.0'
      - 'HCC_AMDGPU_TARGETS=gfx900'

time=2024-02-24T10:16:09.280Z level=INFO source=images.go:710 msg="total blobs: 31"
time=2024-02-24T10:16:09.284Z level=INFO source=images.go:717 msg="total unused blobs removed: 0"
time=2024-02-24T10:16:09.285Z level=INFO source=routes.go:1019 msg="Listening on [::]:11434 (version 0.1.27)"
time=2024-02-24T10:16:09.285Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..."
time=2024-02-24T10:16:12.184Z level=INFO source=payload_common.go:146 msg="Dynamic LLM libraries [rocm_v5 rocm_v6 cpu cpu_avx cpu_avx2 cuda_v11]"
time=2024-02-24T10:16:12.184Z level=INFO source=gpu.go:94 msg="Detecting GPU type"
time=2024-02-24T10:16:12.184Z level=INFO source=gpu.go:265 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-02-24T10:16:12.188Z level=INFO source=gpu.go:311 msg="Discovered GPU libraries: []"
time=2024-02-24T10:16:12.188Z level=INFO source=gpu.go:265 msg="Searching for GPU management library librocm_smi64.so"
time=2024-02-24T10:16:12.189Z level=INFO source=gpu.go:311 msg="Discovered GPU libraries: [/opt/rocm/lib/librocm_smi64.so.5.0.50701 /opt/rocm-5.7.1/lib/librocm_smi64.so.5.0.50701]"
time=2024-02-24T10:16:12.191Z level=INFO source=gpu.go:109 msg="Radeon GPU detected"
time=2024-02-24T10:16:12.191Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-02-24T10:16:12.192Z level=INFO source=gpu.go:181 msg="ROCm unsupported integrated GPU detected"
time=2024-02-24T10:16:12.192Z level=INFO source=routes.go:1042 msg="no GPU detected"

Also the non-docker version doesnt work...

root@ollama:~# HCC_AMDGPU_TARGETS=gfx900 HSA_OVERRIDE_GFX_VERSION=9.0.0 LD_LIBRARY_PATH=/usr/lib ollama serve
time=2024-02-24T10:40:14.582Z level=INFO source=images.go:710 msg="total blobs: 0"
time=2024-02-24T10:40:14.582Z level=INFO source=images.go:717 msg="total unused blobs removed: 0"
time=2024-02-24T10:40:14.583Z level=INFO source=routes.go:1019 msg="Listening on 127.0.0.1:11434 (version 0.1.27)"
time=2024-02-24T10:40:14.583Z level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..."
time=2024-02-24T10:40:17.691Z level=INFO source=payload_common.go:146 msg="Dynamic LLM libraries [cpu_avx cpu_avx2 rocm_v6 cuda_v11 rocm_v5 cpu]"
time=2024-02-24T10:40:17.691Z level=INFO source=gpu.go:94 msg="Detecting GPU type"
time=2024-02-24T10:40:17.691Z level=INFO source=gpu.go:265 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-02-24T10:40:17.692Z level=INFO source=gpu.go:311 msg="Discovered GPU libraries: []"
time=2024-02-24T10:40:17.692Z level=INFO source=gpu.go:265 msg="Searching for GPU management library librocm_smi64.so"
time=2024-02-24T10:40:17.693Z level=INFO source=gpu.go:311 msg="Discovered GPU libraries: [/usr/lib/librocm_smi64.so.1.0]"
time=2024-02-24T10:40:17.696Z level=INFO source=gpu.go:109 msg="Radeon GPU detected"
time=2024-02-24T10:40:17.696Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-02-24T10:40:17.696Z level=INFO source=gpu.go:181 msg="ROCm unsupported integrated GPU detected"
time=2024-02-24T10:40:17.696Z level=INFO source=routes.go:1042 msg="no GPU detected"

root@ollama:~# rocminfo
ROCk module is loaded
=====================
HSA System Attributes
=====================
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE
System Endianness:       LITTLE

==========
HSA Agents
==========
*******
Agent 1
*******
  Name:                    AMD Ryzen 7 5800H with Radeon Graphics
  Uuid:                    CPU-XX
  Marketing Name:          AMD Ryzen 7 5800H with Radeon Graphics
  Vendor Name:             CPU
  Feature:                 None specified
  Profile:                 FULL_PROFILE
  Float Round Mode:        NEAR
  Max Queue Number:        0(0x0)
  Queue Min Size:          0(0x0)
  Queue Max Size:          0(0x0)
  Queue Type:              MULTI
  Node:                    0
  Device Type:             CPU
  Cache Info:
    L1:                      32768(0x8000) KB
  Chip ID:                 0(0x0)
  Cacheline Size:          64(0x40)
  Max Clock Freq. (MHz):   4463
  BDFID:                   0
  Internal Node ID:        0
  Compute Unit:            16
  SIMDs per CU:            0
  Shader Engines:          0
  Shader Arrs. per Eng.:   0
  WatchPts on Addr. Ranges:1
  Features:                None
  Pool Info:
    Pool 1
      Segment:                 GLOBAL; FLAGS: FINE GRAINED
      Size:                    65216764(0x3e320fc) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Accessible by all:       TRUE
    Pool 2
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    65216764(0x3e320fc) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Accessible by all:       TRUE
    Pool 3
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED
      Size:                    65216764(0x3e320fc) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Accessible by all:       TRUE
  ISA Info:
*******
Agent 2
*******
  Name:                    gfx90c
  Uuid:                    GPU-XX
  Marketing Name:          AMD Radeon Graphics
  Vendor Name:             AMD
  Feature:                 KERNEL_DISPATCH
  Profile:                 BASE_PROFILE
  Float Round Mode:        NEAR
  Max Queue Number:        128(0x80)
  Queue Min Size:          64(0x40)
  Queue Max Size:          131072(0x20000)
  Queue Type:              MULTI
  Node:                    1
  Device Type:             GPU
  Cache Info:
    L1:                      16(0x10) KB
    L2:                      1024(0x400) KB
  Chip ID:                 5688(0x1638)
  Cacheline Size:          64(0x40)
  Max Clock Freq. (MHz):   2000
  BDFID:                   1536
  Internal Node ID:        1
  Compute Unit:            8
  SIMDs per CU:            4
  Shader Engines:          1
  Shader Arrs. per Eng.:   1
  WatchPts on Addr. Ranges:4
  Features:                KERNEL_DISPATCH
  Fast F16 Operation:      TRUE
  Wavefront Size:          64(0x40)
  Workgroup Max Size:      1024(0x400)
  Workgroup Max Size per Dimension:
    x                        1024(0x400)
    y                        1024(0x400)
    z                        1024(0x400)
  Max Waves Per CU:        40(0x28)
  Max Work-item Per CU:    2560(0xa00)
  Grid Max Size:           4294967295(0xffffffff)
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)
    y                        4294967295(0xffffffff)
    z                        4294967295(0xffffffff)
  Max fbarriers/Workgrp:   32
  Pool Info:
    Pool 1
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED
      Size:                    524288(0x80000) KB
      Allocatable:             TRUE
      Alloc Granule:           4KB
      Alloc Alignment:         4KB
      Accessible by all:       FALSE
    Pool 2
      Segment:                 GROUP
      Size:                    64(0x40) KB
      Allocatable:             FALSE
      Alloc Granule:           0KB
      Alloc Alignment:         0KB
      Accessible by all:       FALSE
  ISA Info:
    ISA 1
      Name:                    amdgcn-amd-amdhsa--gfx90c:xnack-
      Machine Models:          HSA_MACHINE_MODEL_LARGE
      Profiles:                HSA_PROFILE_BASE
      Default Rounding Mode:   NEAR
      Default Rounding Mode:   NEAR
      Fast f16:                TRUE
      Workgroup Max Size:      1024(0x400)
      Workgroup Max Size per Dimension:
        x                        1024(0x400)
        y                        1024(0x400)
        z                        1024(0x400)
      Grid Max Size:           4294967295(0xffffffff)
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)
        y                        4294967295(0xffffffff)
        z                        4294967295(0xffffffff)
      FBarrier Max Size:       32
*** Done ***

@dhiltgen please have a look

DocMAX commented 3 months ago

And by the way there is no /sys/module/amdgpu/version. You have to correct the code.

robertvazan commented 3 months ago

ROCm unsupported integrated GPU detected

Ollama skipped the iGPU, because it has less than 1GB of VRAM. You have to configure VRAM allocation for the iGPU in BIOS to something like 8GB.

DocMAX commented 3 months ago

Ollama skipped the iGPU, because it has less than 1GB of VRAM. You have to configure VRAM allocation for the iGPU in BIOS to something like 8GB.

Thanks i will check if i can do that. But normal behaviour for the iGPU should be that it requests more VRAM if needed.

robertvazan commented 3 months ago

But normal behaviour for the iGPU should be that it requests more VRAM if needed.

Why do you think so? Where is it documented? Mine maxes at 512MB unless I explicitly configure it in BIOS.

sid-cypher commented 3 months ago

Ollama skipped the iGPU, because it has less than 1GB of VRAM. You have to configure VRAM allocation for the iGPU in BIOS to something like 8GB.

Detecting and using this VRAM information without sharing with the user the reason for the iGPU rejection leads to "missing support" issues being opened, rather than "increase my VRAM allocation" steps taken. I think the log output should be improved in this case. This task would probably qualify for a "good first issue" tag, too.

DocMAX commented 3 months ago

Totally agree!

chiragkrishna commented 3 months ago

i have 2 systems. Ryzen 5500U system always gets stuck here. ive allotted 4gb vram for it in the bios. its the max.

export HSA_OVERRIDE_GFX_VERSION=9.0.0 export HCC_AMDGPU_TARGETS=gfx900

llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 25/25 layers to GPU
llm_load_tensors:      ROCm0 buffer size =   703.44 MiB
llm_load_tensors:        CPU buffer size =    35.44 MiB

building with

export CGO_CFLAGS="-g"
export AMDGPU_TARGETS="gfx1030;gfx900"
go generate ./...
go build .

my 6750xt system works perfectly

DocMAX commented 3 months ago

But normal behaviour for the iGPU should be that it requests more VRAM if needed.

Why do you think so? Where is it documented? Mine maxes at 512MB unless I explicitly configure it in BIOS.

OK i was wrong. Works now with 8GB VRAM, thank you!

discovered 1 ROCm GPU Devices
[0] ROCm device name: Cezanne [Radeon Vega Series / Radeon Vega Mobile Series]
[0] ROCm brand: Cezanne [Radeon Vega Series / Radeon Vega Mobile Series]
[0] ROCm vendor: Advanced Micro Devices, Inc. [AMD/ATI]
[0] ROCm VRAM vendor: unknown
[0] ROCm S/N: 
[0] ROCm subsystem name: 0x123
[0] ROCm vbios version: 113-CEZANNE-018
[0] ROCm totalMem 8589934592
[0] ROCm usedMem 25907200
time=2024-02-24T18:27:14.013Z level=DEBUG source=gpu.go:254 msg="rocm detected 1 devices with 7143M available memory"

DocMAX commented 3 months ago

Hmm, i see the model loaded into VRAM, but nothing happens...

llm_load_tensors: ggml ctx size =    0.22 MiB
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors:      ROCm0 buffer size =  3577.56 MiB
llm_load_tensors:        CPU buffer size =    70.31 MiB

DocMAX commented 3 months ago

Do i need another amdgpu module on the host than the one from the kernel (6.7.6)?

sid-cypher commented 3 months ago

Do i need another amdgpu module on the host than the one from the kernel (6.7.6)?

Maybe, https://github.com/ROCm/ROCm/issues/816 seems relevant. I'm just using AMD-provided DKMS modules from https://repo.radeon.com/amdgpu/6.0.2/ubuntu to be sure.

DocMAX commented 3 months ago

Hmm, tinyllama model does work with 5800U. The bigger ones stuck as i mentioned before. Edit: Codellama works too.

chiragkrishna commented 3 months ago

i added this "-DLLAMA_HIP_UMA=ON" to "ollama/llm/generate/gen_linux.sh"

CMAKE_DEFS="${COMMON_CMAKE_DEFS} ${CMAKE_DEFS} -DLLAMA_HIPBLAS=on -DLLAMA_HIP_UMA=ON -DCMAKE_C_COMPILER=$ROCM_PATH/llvm/bin/clang -DCMAKE_CXX_COMPILER=$ROCM_PATH/llvm/bin/clang++ -DAMDGPU_TARGETS=$(amdGPUs) -DGPU_TARGETS=$(amdGPUs)"

now its stuck here

llm_load_tensors: offloading 22 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 23/23 layers to GPU
llm_load_tensors:      ROCm0 buffer size =   809.59 MiB
llm_load_tensors:        CPU buffer size =    51.27 MiB
...............................................................................
llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:      ROCm0 KV buffer size =    44.00 MiB
llama_new_context_with_model: KV self size  =   44.00 MiB, K (f16):   22.00 MiB, V (f16):   22.00 MiB
llama_new_context_with_model:  ROCm_Host input buffer size   =     9.02 MiB
llama_new_context_with_model:      ROCm0 compute buffer size =   148.01 MiB
llama_new_context_with_model:  ROCm_Host compute buffer size =     4.00 MiB
llama_new_context_with_model: graph splits (measure): 3
[1708857011] warming up the model with an empty run

robertvazan commented 3 months ago

iGPUs indeed do allocate system RAM on demand. It's called GTT/GART. Here's what I get when I run sudo dmesg | grep "M of" on my system with 32GB RAM:

If I set VRAM to Auto in BIOS:

[    4.654736] [drm] amdgpu: 512M of VRAM memory ready
[    4.654737] [drm] amdgpu: 15688M of GTT memory ready.

If I set VRAM to 8GB in BIOS:

[    4.670921] [drm] amdgpu: 8192M of VRAM memory ready
[    4.670923] [drm] amdgpu: 11908M of GTT memory ready.

If I set VRAM to 16GB in BIOS:

[    4.600060] [drm] amdgpu: 16384M of VRAM memory ready
[    4.600062] [drm] amdgpu: 7888M of GTT memory ready.

It looks like GTT size is 0.5*(RAM-VRAM). I wonder how far can this go if you have 64GB or 96GB RAM. Can you have iGPU with 32GB or 48GB of GTT memory? That would make $200 APU with $200 DDR5 RAM superior to $2,000 dGPU for running Mixtral and future sparse models. I also wonder whether any BIOS offers 32GB VRAM setting if you have 64GB of RAM.

Unfortunately, ROCm does not use GTT. That thread mentions several workarounds (torch-apu-helper, force-host-alloction-APU, Rusticl, unlock VRAM allocation), but I am not sure whether Ollama would be able to use any of them. Chances are highest in docker container where Ollama has greatest control over dependencies.

DocMAX commented 3 months ago

Very cool findings. Interesting you mention 96GB. I did a research and it seems thats the max. we can buy right now for SO-DIMMS. Wasn't aware it's called GTT. Let's hope someday we get support for this. If host can't handle GTT for ROCm, then i doubt docker can't do anything about it.

https://github.com/segurac/force-host-alloction-APU looks like the best solution to me if it works. Will try in my docker containers...

[So Feb 25 21:31:38 2024] [drm] amdgpu: 512M of VRAM memory ready
[So Feb 25 21:31:38 2024] [drm] amdgpu: 31844M of GTT memory ready.

This is how much i would get :-) (64GB system)

DocMAX commented 3 months ago

OK, doesn't work with ollama. Wasn't aware that it doesn't use PyTorch right?

chiragkrishna commented 3 months ago

llama.cpp supports it. thats what i was trying to do in my previous post. Support AMD Ryzen Unified Memory Architecture (UMA)

robertvazan commented 3 months ago

@chiragkrishna Do you mean this? https://github.com/ggerganov/llama.cpp/pull/4449

Since llama.cpp already supports UMA (GGT/GART), Ollama could perhaps include llama.cpp build with UMA enabled and use it when the conditions are right (AMD iGPU with VRAM smaller than the model).

PS: UMA support seems a bit unstable, so perhaps enable it with environment variable at first.

DocMAX commented 3 months ago

How does the env thing work? Like this? (Doesn't do anything btw) LLAMA_HIP_UMA=1 HSA_OVERRIDE_GFX_VERSION=9.0.0 HCC_AMDGPU_TARGETS==gfx900 ollama start

robertvazan commented 3 months ago

@DocMAX I don't think there's UMA support in ollama yet. It's a compile-time option in llama.cpp. The other env variables (HSA_OVERRIDE_GFX_VERSION was sufficient in my experiments) are correctly passed down to ROCm.

chiragkrishna commented 3 months ago

git clone and add them here

i added this "-DLLAMA_HIP_UMA=ON" to "ollama/llm/generate/gen_linux.sh"

CMAKE_DEFS="${COMMON_CMAKE_DEFS} ${CMAKE_DEFS} -DLLAMA_HIPBLAS=on -DLLAMA_HIP_UMA=ON -DCMAKE_C_COMPILER=$ROCM_PATH/llvm/bin/clang -DCMAKE_CXX_COMPILER=$ROCM_PATH/llvm/bin/clang++ -DAMDGPU_TARGETS=$(amdGPUs) -DGPU_TARGETS=$(amdGPUs)"

now its stuck here

llm_load_tensors: offloading 22 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 23/23 layers to GPU
llm_load_tensors:      ROCm0 buffer size =   809.59 MiB
llm_load_tensors:        CPU buffer size =    51.27 MiB
...............................................................................
llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:      ROCm0 KV buffer size =    44.00 MiB
llama_new_context_with_model: KV self size  =   44.00 MiB, K (f16):   22.00 MiB, V (f16):   22.00 MiB
llama_new_context_with_model:  ROCm_Host input buffer size   =     9.02 MiB
llama_new_context_with_model:      ROCm0 compute buffer size =   148.01 MiB
llama_new_context_with_model:  ROCm_Host compute buffer size =     4.00 MiB
llama_new_context_with_model: graph splits (measure): 3
[1708857011] warming up the model with an empty run

dhiltgen commented 3 months ago

I haven't dug deeply into this yet, but from what I've seen, I believe we'll need a second variant for ROCm compiled with system/unified memory support to support modern iGPUs. Setting these flags in llama.cpp will degrade performance on discrete GPUs, but since we have a model already to support multiple variants, it shouldn't be a problem to have both.

I'm working on some refinements to amdgpu discovery to try to pivot over to pure sysfs discovery which should help here.

DocMAX commented 3 months ago

CMAKE_DEFS="${COMMON_CMAKE_DEFS} ${CMAKE_DEFS} -DLLAMA_HIPBLAS=on -DLLAMA_HIP_UMA=ON -DCMAKE_C_COMPILER=$ROCM_PATH/llvm/bin/clang -DCMAKE_CXX_COMPILER=$ROCM_PATH/llvm/bin/clang++ -DAMDGPU_TARGETS=$(amdGPUs) -DGPU_TARGETS=$(amdGPUs)"

Did so, but i still get "no GPU detected"...

chiragkrishna commented 3 months ago

build:

git clone https://github.com/ollama/ollama.git
add "-DLLAMA_HIP_UMA=ON" to "ollama/llm/generate/gen_linux.sh" to CMAKE_DEFS=
export CGO_CFLAGS="-g"
export AMDGPU_TARGETS="gfx900"
go generate ./...
go build .

run:

export HSA_OVERRIDE_GFX_VERSION="9.0.0"
./ollama serve

DocMAX commented 3 months ago

time=2024-02-27T22:36:04.112Z level=INFO source=gpu.go:94 msg="Detecting GPU type"
time=2024-02-27T22:36:04.112Z level=INFO source=gpu.go:265 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-02-27T22:36:04.156Z level=INFO source=gpu.go:311 msg="Discovered GPU libraries: []"
time=2024-02-27T22:36:04.156Z level=INFO source=gpu.go:265 msg="Searching for GPU management library librocm_smi64.so"
time=2024-02-27T22:36:04.165Z level=INFO source=gpu.go:311 msg="Discovered GPU libraries: [/opt/rocm/lib/librocm_smi64.so.6.0.60002 /opt/rocm-6.0.2/lib/librocm_smi64.so.6.0.60002]"
time=2024-02-27T22:36:04.240Z level=INFO source=gpu.go:109 msg="Radeon GPU detected"
time=2024-02-27T22:36:04.240Z level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-02-27T22:36:04.240Z level=INFO source=gpu.go:181 msg="ROCm unsupported integrated GPU detected"
time=2024-02-27T22:36:04.240Z level=INFO source=routes.go:1042 msg="no GPU detected"

Did exactly so, but not working... very strange. CPU: AMD 5800U

chiragkrishna commented 3 months ago

try playing with.

ROCR_VISIBLE_DEVICES=0 ollama serve
or 
ROCR_VISIBLE_DEVICES=1 ollama serve

ollama has few broken checks for amd integrated gpus currently

gpu.go:181 msg="ROCm unsupported integrated GPU detected"

DocMAX commented 3 months ago

Nope, doesn't make a difference :-(

chiragkrishna commented 3 months ago

change "tooOld" to this and compile and see.

ollama/gpu/gpu.go from line 173

gfx := AMDGFXVersions()
        tooOld := false
        for _, v := range gfx {
            if v.Major < 9 {
                slog.Info("AMD GPU too old, falling back to CPU " + v.ToGFXString())
                tooOld = true
                break
            }

            // TODO - remap gfx strings for unsupporetd minor/patch versions to supported for the same major
            // e.g. gfx1034 works if we map it to gfx1030 at runtime

        }
        if !tooOld {
            // TODO - this algo can be shifted over to use sysfs instead of the rocm info library...
            C.rocm_check_vram(*gpuHandles.rocm, &memInfo)
                resp.Library = "rocm"
                var version C.rocm_version_resp_t
                C.rocm_get_version(*gpuHandles.rocm, &version)
                verString := C.GoString(version.str)
                if version.status == 0 {
                    resp.Variant = "v" + verString
                } else {
                    slog.Info(fmt.Sprintf("failed to look up ROCm version: %s", verString))
                }
                C.free(unsafe.Pointer(version.str))
        }
    }
    if resp.Library == "" {
        C.cpu_check_ram(&memInfo)
        resp.Library = "cpu"
        resp.Variant = cpuVariant

even if your gpu is detected you will be stuck at my place i guess.

DocMAX commented 3 months ago

Still no GPU, i give up.

DocMAX commented 2 months ago

Something happened now...

Mar 08 01:22:47 ai ollama[10945]: time=2024-03-08T01:22:47.056Z level=INFO source=amd_linux.go:88 msg="detected amdgpu versions [gfx9012]"
Mar 08 01:22:47 ai ollama[10945]: time=2024-03-08T01:22:47.056Z level=INFO source=amd_linux.go:238 msg="[1] amdgpu totalMemory 536870912"
Mar 08 01:22:47 ai ollama[10945]: time=2024-03-08T01:22:47.056Z level=INFO source=amd_linux.go:239 msg="[1] amdgpu freeMemory  536870912"
Mar 08 01:22:47 ai ollama[10945]: time=2024-03-08T01:22:47.056Z level=INFO source=llm.go:111 msg="not enough vram available, falling back to CPU only"

I compiled with "-DLLAMA_HIP_UMA=ON"... So UMA still not working...

chiragkrishna commented 2 months ago

compiled just now. no luck with ryzen 5500U

ollama serve                                                                             ─╯
time=2024-03-08T07:36:58.079+05:30 level=INFO source=images.go:796 msg="total blobs: 11"
time=2024-03-08T07:36:58.079+05:30 level=INFO source=images.go:803 msg="total unused blobs removed: 0"
[GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.

[GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
 - using env:   export GIN_MODE=release
 - using code:  gin.SetMode(gin.ReleaseMode)

[GIN-debug] POST   /api/pull                 --> github.com/jmorganca/ollama/server.PullModelHandler (5 handlers)
[GIN-debug] POST   /api/generate             --> github.com/jmorganca/ollama/server.GenerateHandler (5 handlers)
[GIN-debug] POST   /api/chat                 --> github.com/jmorganca/ollama/server.ChatHandler (5 handlers)
[GIN-debug] POST   /api/embeddings           --> github.com/jmorganca/ollama/server.EmbeddingsHandler (5 handlers)
[GIN-debug] POST   /api/create               --> github.com/jmorganca/ollama/server.CreateModelHandler (5 handlers)
[GIN-debug] POST   /api/push                 --> github.com/jmorganca/ollama/server.PushModelHandler (5 handlers)
[GIN-debug] POST   /api/copy                 --> github.com/jmorganca/ollama/server.CopyModelHandler (5 handlers)
[GIN-debug] DELETE /api/delete               --> github.com/jmorganca/ollama/server.DeleteModelHandler (5 handlers)
[GIN-debug] POST   /api/show                 --> github.com/jmorganca/ollama/server.ShowModelHandler (5 handlers)
[GIN-debug] POST   /api/blobs/:digest        --> github.com/jmorganca/ollama/server.CreateBlobHandler (5 handlers)
[GIN-debug] HEAD   /api/blobs/:digest        --> github.com/jmorganca/ollama/server.HeadBlobHandler (5 handlers)
[GIN-debug] POST   /v1/chat/completions      --> github.com/jmorganca/ollama/server.ChatHandler (6 handlers)
[GIN-debug] GET    /                         --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
[GIN-debug] GET    /api/tags                 --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers)
[GIN-debug] GET    /api/version              --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func3 (5 handlers)
[GIN-debug] HEAD   /                         --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
[GIN-debug] HEAD   /api/tags                 --> github.com/jmorganca/ollama/server.ListModelsHandler (5 handlers)
[GIN-debug] HEAD   /api/version              --> github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func3 (5 handlers)
time=2024-03-08T07:36:58.079+05:30 level=INFO source=routes.go:1019 msg="Listening on [::]:11434 (version 0.0.0)"
time=2024-03-08T07:36:58.080+05:30 level=INFO source=payload_common.go:107 msg="Extracting dynamic libraries..."
time=2024-03-08T07:36:58.922+05:30 level=INFO source=payload_common.go:150 msg="Dynamic LLM libraries [rocm_v6 cpu_avx2 cpu cpu_avx rocm_v60002]"
time=2024-03-08T07:36:58.922+05:30 level=DEBUG source=payload_common.go:151 msg="Override detection logic by setting OLLAMA_LLM_LIBRARY"
time=2024-03-08T07:36:58.922+05:30 level=INFO source=gpu.go:77 msg="Detecting GPU type"
time=2024-03-08T07:36:58.922+05:30 level=INFO source=gpu.go:191 msg="Searching for GPU management library libnvidia-ml.so"
time=2024-03-08T07:36:58.922+05:30 level=DEBUG source=gpu.go:209 msg="gpu management search paths: [/usr/local/cuda/lib64/libnvidia-ml.so* /usr/lib/x86_64-linux-gnu/nvidia/current/libnvidia-ml.so* /usr/lib/x86_64-linux-gnu/libnvidia-ml.so* /usr/lib/wsl/lib/libnvidia-ml.so* /usr/lib/wsl/drivers/*/libnvidia-ml.so* /opt/cuda/lib64/libnvidia-ml.so* /usr/lib*/libnvidia-ml.so* /usr/local/lib*/libnvidia-ml.so* /usr/lib/aarch64-linux-gnu/nvidia/current/libnvidia-ml.so* /usr/lib/aarch64-linux-gnu/libnvidia-ml.so* /opt/cuda/targets/x86_64-linux/lib/stubs/libnvidia-ml.so* /home/bunneo/libnvidia-ml.so*]"
time=2024-03-08T07:36:58.924+05:30 level=INFO source=gpu.go:237 msg="Discovered GPU libraries: []"
time=2024-03-08T07:36:58.924+05:30 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-08T07:36:58.924+05:30 level=WARN source=amd_linux.go:53 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers: amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-03-08T07:36:58.924+05:30 level=INFO source=amd_linux.go:88 msg="detected amdgpu versions [gfx9012]"
time=2024-03-08T07:36:58.924+05:30 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /home/bunneo/.ollama/assets/0.0.0/rocm"
time=2024-03-08T07:36:58.924+05:30 level=DEBUG source=amd_linux.go:123 msg="skipping rocm gfx compatibility check with HSA_OVERRIDE_GFX_VERSION=9.0.0"
time=2024-03-08T07:36:58.925+05:30 level=DEBUG source=amd_linux.go:171 msg="discovering amdgpu devices [1]"
time=2024-03-08T07:36:58.925+05:30 level=INFO source=amd_linux.go:238 msg="[1] amdgpu totalMemory 4294967296"
time=2024-03-08T07:36:58.925+05:30 level=INFO source=amd_linux.go:239 msg="[1] amdgpu freeMemory  4294967296"
time=2024-03-08T07:36:58.925+05:30 level=DEBUG source=gpu.go:180 msg="rocm detected 1 devices with 3072M available memory"
[GIN] 2024/03/08 - 07:37:12 | 200 |      60.553µs |       127.0.0.1 | HEAD     "/"
[GIN] 2024/03/08 - 07:37:12 | 200 |     349.625µs |       127.0.0.1 | POST     "/api/show"
[GIN] 2024/03/08 - 07:37:12 | 200 |     178.304µs |       127.0.0.1 | POST     "/api/show"
time=2024-03-08T07:37:12.650+05:30 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-08T07:37:12.651+05:30 level=WARN source=amd_linux.go:53 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers: amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-03-08T07:37:12.651+05:30 level=INFO source=amd_linux.go:88 msg="detected amdgpu versions [gfx9012]"
time=2024-03-08T07:37:12.651+05:30 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /home/bunneo/.ollama/assets/0.0.0/rocm"
time=2024-03-08T07:37:12.651+05:30 level=DEBUG source=amd_linux.go:123 msg="skipping rocm gfx compatibility check with HSA_OVERRIDE_GFX_VERSION=9.0.0"
time=2024-03-08T07:37:12.651+05:30 level=DEBUG source=amd_linux.go:171 msg="discovering amdgpu devices [1]"
time=2024-03-08T07:37:12.651+05:30 level=INFO source=amd_linux.go:238 msg="[1] amdgpu totalMemory 4294967296"
time=2024-03-08T07:37:12.651+05:30 level=INFO source=amd_linux.go:239 msg="[1] amdgpu freeMemory  4294967296"
time=2024-03-08T07:37:12.651+05:30 level=DEBUG source=gpu.go:180 msg="rocm detected 1 devices with 3072M available memory"
time=2024-03-08T07:37:12.651+05:30 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-08T07:37:12.651+05:30 level=WARN source=amd_linux.go:53 msg="ollama recommends running the https://www.amd.com/en/support/linux-drivers: amdgpu version file missing: /sys/module/amdgpu/version stat /sys/module/amdgpu/version: no such file or directory"
time=2024-03-08T07:37:12.651+05:30 level=INFO source=amd_linux.go:88 msg="detected amdgpu versions [gfx9012]"
time=2024-03-08T07:37:12.651+05:30 level=DEBUG source=amd_common.go:16 msg="evaluating potential rocm lib dir /home/bunneo/.ollama/assets/0.0.0/rocm"
time=2024-03-08T07:37:12.651+05:30 level=DEBUG source=amd_linux.go:123 msg="skipping rocm gfx compatibility check with HSA_OVERRIDE_GFX_VERSION=9.0.0"
time=2024-03-08T07:37:12.651+05:30 level=DEBUG source=amd_linux.go:171 msg="discovering amdgpu devices [1]"
time=2024-03-08T07:37:12.652+05:30 level=INFO source=amd_linux.go:238 msg="[1] amdgpu totalMemory 4294967296"
time=2024-03-08T07:37:12.652+05:30 level=INFO source=amd_linux.go:239 msg="[1] amdgpu freeMemory  4294967296"
time=2024-03-08T07:37:12.652+05:30 level=INFO source=cpu_common.go:11 msg="CPU has AVX2"
time=2024-03-08T07:37:12.652+05:30 level=DEBUG source=payload_common.go:93 msg="ordered list of LLM libraries to try [/home/bunneo/.ollama/assets/0.0.0/rocm_v6/libext_server.so /home/bunneo/.ollama/assets/0.0.0/rocm_v60002/libext_server.so /home/bunneo/.ollama/assets/0.0.0/cpu_avx2/libext_server.so]"
loading library /home/bunneo/.ollama/assets/0.0.0/rocm_v6/libext_server.so
time=2024-03-08T07:37:12.687+05:30 level=INFO source=dyn_ext_server.go:90 msg="Loading Dynamic llm server: /home/bunneo/.ollama/assets/0.0.0/rocm_v6/libext_server.so"
time=2024-03-08T07:37:12.688+05:30 level=INFO source=dyn_ext_server.go:150 msg="Initializing llama server"
time=2024-03-08T07:37:12.688+05:30 level=DEBUG source=dyn_ext_server.go:151 msg="server params: {model:0x7e7f1c109df0 n_ctx:2048 n_batch:512 n_threads:0 n_parallel:1 rope_freq_base:0 rope_freq_scale:0 memory_f16:true n_gpu_layers:23 main_gpu:0 use_mlock:false use_mmap:true numa:0 embedding:true lora_adapters:<nil> mmproj:<nil> verbose_logging:true _:[0 0 0 0 0 0 0]}"
[1709863632] system info: AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | 
[1709863632] Performing pre-initialization of GPU
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 ROCm devices:
  Device 0: AMD Radeon Graphics, compute capability 9.0, VMM: no
CUDA error: out of memory
  current device: 0, in function ggml_init_cublas at /home/bunneo/ollama/llm/llama.cpp/ggml-cuda.cu:8771
  hipStreamCreateWithFlags(&g_cudaStreams[id][is], 0x01)
GGML_ASSERT: /home/bunneo/ollama/llm/llama.cpp/ggml-cuda.cu:256: !"CUDA error"
Could not attach to process.  If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user.  For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No stack.
The program is not being run.
SIGABRT: abort
PC=0x7e7f8fc969fc m=9 sigcode=18446744073709551610
signal arrived during cgo execution

goroutine 8 gp=0xc000580a80 m=9 mp=0xc000584808 [syscall]:
runtime.cgocall(0xeba490, 0xc00004c760)
    /usr/local/go/src/runtime/cgocall.go:157 +0x4b fp=0xc00004c738 sp=0xc00004c700 pc=0x40a74b
github.com/jmorganca/ollama/llm._Cfunc_dyn_llama_server_init({0x7e7f1c001230, 0x7e7f02e82840, 0x7e7f02e83120, 0x7e7f02e831b0, 0x7e7f02e83410, 0x7e7f02e83600, 0x7e7f02e83ee0, 0x7e7f02e83eb0, 0x7e7f02e83fa0, 0x7e7f02e844f0, ...}, ...)
    _cgo_gotypes.go:290 +0x45 fp=0xc00004c760 sp=0xc00004c738 pc=0xce47a5
github.com/jmorganca/ollama/llm.newDynExtServer.func7(0xc0000a8230, 0xc00089c030)
    /home/bunneo/ollama/llm/dyn_ext_server.go:154 +0x112 fp=0xc00004c8a0 sp=0xc00004c760 pc=0xce5e52
github.com/jmorganca/ollama/llm.newDynExtServer({0xc0003c1600, 0x3a}, {0xc00055e230, _}, {_, _, _}, {0x0, 0x0, 0x0}, ...)
    /home/bunneo/ollama/llm/dyn_ext_server.go:154 +0xb50 fp=0xc00004cae8 sp=0xc00004c8a0 pc=0xce5a90
github.com/jmorganca/ollama/llm.newLlmServer({{_, _, _}, {_, _}, {_, _}}, {_, _}, {0x0, ...}, ...)
    /home/bunneo/ollama/llm/llm.go:158 +0x4c5 fp=0xc00004cca8 sp=0xc00004cae8 pc=0xce2085
github.com/jmorganca/ollama/llm.New({0xc00055e230, 0x69}, {0x0, 0x0, 0x0}, {0x0, _, _}, {{0x0, 0x800, ...}, ...})
    /home/bunneo/ollama/llm/llm.go:123 +0x76e fp=0xc00004cf18 sp=0xc00004cca8 pc=0xce194e
github.com/jmorganca/ollama/server.load(0xc000554000?, 0xc000554000, {{0x0, 0x800, 0x200, 0x1, 0xffffffffffffffff, 0x0, 0x0, 0x1, ...}, ...}, ...)
    /home/bunneo/ollama/server/routes.go:83 +0x325 fp=0xc00004d068 sp=0xc00004cf18 pc=0xe92ae5
github.com/jmorganca/ollama/server.ChatHandler(0xc0000c9600)
    /home/bunneo/ollama/server/routes.go:1173 +0xa37 fp=0xc00004d770 sp=0xc00004d068 pc=0xe9e2b7
github.com/gin-gonic/gin.(*Context).Next(...)
    /home/bunneo/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/context.go:174
github.com/jmorganca/ollama/server.(*Server).GenerateRoutes.func1(0xc0000c9600)
    /home/bunneo/ollama/server/routes.go:943 +0x68 fp=0xc00004d7a8 sp=0xc00004d770 pc=0xe9ca48
github.com/gin-gonic/gin.(*Context).Next(...)
    /home/bunneo/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/context.go:174
github.com/gin-gonic/gin.CustomRecoveryWithWriter.func1(0xc0000c9600)
    /home/bunneo/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/recovery.go:102 +0x7a fp=0xc00004d7f8 sp=0xc00004d7a8 pc=0xe72a1a
github.com/gin-gonic/gin.(*Context).Next(...)
    /home/bunneo/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/context.go:174
github.com/gin-gonic/gin.LoggerWithConfig.func1(0xc0000c9600)
    /home/bunneo/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/logger.go:240 +0xdd fp=0xc00004d9a8 sp=0xc00004d7f8 pc=0xe71b5d
github.com/gin-gonic/gin.(*Context).Next(...)
    /home/bunneo/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/context.go:174
github.com/gin-gonic/gin.(*Engine).handleHTTPRequest(0xc0000d0340, 0xc0000c9600)
    /home/bunneo/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/gin.go:620 +0x66e fp=0xc00004db28 sp=0xc00004d9a8 pc=0xe7104e
github.com/gin-gonic/gin.(*Engine).ServeHTTP(0xc0000d0340, {0x43da148, 0xc000178000}, 0xc000172000)
    /home/bunneo/go/pkg/mod/github.com/gin-gonic/gin@v1.9.1/gin.go:576 +0x1b2 fp=0xc00004db60 sp=0xc00004db28 pc=0xe70812
net/http.serverHandler.ServeHTTP({0x43d8028?}, {0x43da148?, 0xc000178000?}, 0x6?)
    /usr/local/go/src/net/http/server.go:3137 +0x8e fp=0xc00004db90 sp=0xc00004db60 pc=0x6fe1ee
net/http.(*conn).serve(0xc0000cc090, {0x43dc508, 0xc0001af440})
    /usr/local/go/src/net/http/server.go:2039 +0x5e8 fp=0xc00004dfb8 sp=0xc00004db90 pc=0x6f95a8
net/http.(*Server).Serve.gowrap3()
    /usr/local/go/src/net/http/server.go:3285 +0x28 fp=0xc00004dfe0 sp=0xc00004dfb8 pc=0x6fea08
runtime.goexit({})
    /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc00004dfe8 sp=0xc00004dfe0 pc=0x474301
created by net/http.(*Server).Serve in goroutine 1
    /usr/local/go/src/net/http/server.go:3285 +0x4b4

goroutine 1 gp=0xc0000061c0 m=nil [IO wait]:
runtime.gopark(0xc000054008?, 0x0?, 0xc0?, 0x61?, 0xc0005af870?)
    /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc0005af838 sp=0xc0005af818 pc=0x44162e
runtime.netpollblock(0xc0005af8d0?, 0x409ee6?, 0x0?)
    /usr/local/go/src/runtime/netpoll.go:573 +0xf7 fp=0xc0005af870 sp=0xc0005af838 pc=0x43a397
internal/poll.runtime_pollWait(0x7e7f8feabe40, 0x72)
    /usr/local/go/src/runtime/netpoll.go:345 +0x85 fp=0xc0005af890 sp=0xc0005af870 pc=0x46ea05
internal/poll.(*pollDesc).wait(0x3?, 0x3fe?, 0x0)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc0005af8b8 sp=0xc0005af890 pc=0x5030c7
internal/poll.(*pollDesc).waitRead(...)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0xc000525100)
    /usr/local/go/src/internal/poll/fd_unix.go:611 +0x2ac fp=0xc0005af960 sp=0xc0005af8b8 pc=0x50846c
net.(*netFD).accept(0xc000525100)
    /usr/local/go/src/net/fd_unix.go:172 +0x29 fp=0xc0005afa18 sp=0xc0005af960 pc=0x597c69
net.(*TCPListener).accept(0xc00052d660)
    /usr/local/go/src/net/tcpsock_posix.go:159 +0x1e fp=0xc0005afa40 sp=0xc0005afa18 pc=0x5acf3e
net.(*TCPListener).Accept(0xc00052d660)
    /usr/local/go/src/net/tcpsock.go:327 +0x30 fp=0xc0005afa70 sp=0xc0005afa40 pc=0x5ac130
net/http.(*onceCloseListener).Accept(0xc0000cc090?)
    <autogenerated>:1 +0x24 fp=0xc0005afa88 sp=0xc0005afa70 pc=0x720bc4
net/http.(*Server).Serve(0xc0005200f0, {0x43d9ed8, 0xc00052d660})
    /usr/local/go/src/net/http/server.go:3255 +0x33e fp=0xc0005afbb8 sp=0xc0005afa88 pc=0x6fe61e
github.com/jmorganca/ollama/server.Serve({0x43d9ed8, 0xc00052d660})
    /home/bunneo/ollama/server/routes.go:1046 +0x4ab fp=0xc0005afcc0 sp=0xc0005afbb8 pc=0xe9cf4b
github.com/jmorganca/ollama/cmd.RunServer(0xc0000c8b00?, {0x4b2f300?, 0x4?, 0x1050133?})
    /home/bunneo/ollama/cmd/cmd.go:787 +0x1b9 fp=0xc0005afd58 sp=0xc0005afcc0 pc=0xeb0c99
github.com/spf13/cobra.(*Command).execute(0xc00054b508, {0x4b2f300, 0x0, 0x0})
    /home/bunneo/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:940 +0x882 fp=0xc0005afe78 sp=0xc0005afd58 pc=0x793b42
github.com/spf13/cobra.(*Command).ExecuteC(0xc00054a908)
    /home/bunneo/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:1068 +0x3a5 fp=0xc0005aff30 sp=0xc0005afe78 pc=0x794385
github.com/spf13/cobra.(*Command).Execute(...)
    /home/bunneo/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:992
github.com/spf13/cobra.(*Command).ExecuteContext(...)
    /home/bunneo/go/pkg/mod/github.com/spf13/cobra@v1.7.0/command.go:985
main.main()
    /home/bunneo/ollama/main.go:11 +0x4d fp=0xc0005aff50 sp=0xc0005aff30 pc=0xeb8e4d
runtime.main()
    /usr/local/go/src/runtime/proc.go:271 +0x29d fp=0xc0005affe0 sp=0xc0005aff50 pc=0x4411fd
runtime.goexit({})
    /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0005affe8 sp=0xc0005affe0 pc=0x474301

goroutine 2 gp=0xc000006c40 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
    /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000074fa8 sp=0xc000074f88 pc=0x44162e
runtime.goparkunlock(...)
    /usr/local/go/src/runtime/proc.go:408
runtime.forcegchelper()
    /usr/local/go/src/runtime/proc.go:326 +0xb3 fp=0xc000074fe0 sp=0xc000074fa8 pc=0x4414b3
runtime.goexit({})
    /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc000074fe8 sp=0xc000074fe0 pc=0x474301
created by runtime.init.6 in goroutine 1
    /usr/local/go/src/runtime/proc.go:314 +0x1a

goroutine 3 gp=0xc000007180 m=nil [GC sweep wait]:
runtime.gopark(0x1?, 0x0?, 0x0?, 0x0?, 0x0?)
    /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000075780 sp=0xc000075760 pc=0x44162e
runtime.goparkunlock(...)
    /usr/local/go/src/runtime/proc.go:408
runtime.bgsweep(0xc00007c000)
    /usr/local/go/src/runtime/mgcsweep.go:318 +0xdf fp=0xc0000757c8 sp=0xc000075780 pc=0x42cbdf
runtime.gcenable.gowrap1()
    /usr/local/go/src/runtime/mgc.go:203 +0x25 fp=0xc0000757e0 sp=0xc0000757c8 pc=0x4214c5
runtime.goexit({})
    /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0000757e8 sp=0xc0000757e0 pc=0x474301
created by runtime.gcenable in goroutine 1
    /usr/local/go/src/runtime/mgc.go:203 +0x66

goroutine 4 gp=0xc000007340 m=nil [GC scavenge wait]:
runtime.gopark(0xa479cc?, 0x9d1b96?, 0x0?, 0x0?, 0x0?)
    /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000075f78 sp=0xc000075f58 pc=0x44162e
runtime.goparkunlock(...)
    /usr/local/go/src/runtime/proc.go:408
runtime.(*scavengerState).park(0x4ac92a0)
    /usr/local/go/src/runtime/mgcscavenge.go:425 +0x49 fp=0xc000075fa8 sp=0xc000075f78 pc=0x42a569
runtime.bgscavenge(0xc00007c000)
    /usr/local/go/src/runtime/mgcscavenge.go:658 +0x59 fp=0xc000075fc8 sp=0xc000075fa8 pc=0x42ab19
runtime.gcenable.gowrap2()
    /usr/local/go/src/runtime/mgc.go:204 +0x25 fp=0xc000075fe0 sp=0xc000075fc8 pc=0x421465
runtime.goexit({})
    /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc000075fe8 sp=0xc000075fe0 pc=0x474301
created by runtime.gcenable in goroutine 1
    /usr/local/go/src/runtime/mgc.go:204 +0xa5

goroutine 18 gp=0xc000104380 m=nil [finalizer wait]:
runtime.gopark(0xc000074648?, 0x414885?, 0xa8?, 0x1?, 0xc0000061c0?)
    /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000074620 sp=0xc000074600 pc=0x44162e
runtime.runfinq()
    /usr/local/go/src/runtime/mfinal.go:194 +0x107 fp=0xc0000747e0 sp=0xc000074620 pc=0x420507
runtime.goexit({})
    /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0000747e8 sp=0xc0000747e0 pc=0x474301
created by runtime.createfing in goroutine 1
    /usr/local/go/src/runtime/mfinal.go:164 +0x3d

goroutine 19 gp=0xc000105c00 m=nil [GC worker (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
    /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000070750 sp=0xc000070730 pc=0x44162e
runtime.gcBgMarkWorker()
    /usr/local/go/src/runtime/mgc.go:1310 +0xe5 fp=0xc0000707e0 sp=0xc000070750 pc=0x4235a5
runtime.goexit({})
    /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0000707e8 sp=0xc0000707e0 pc=0x474301
created by runtime.gcBgMarkStartWorkers in goroutine 1
    /usr/local/go/src/runtime/mgc.go:1234 +0x1c

goroutine 20 gp=0xc000105dc0 m=nil [GC worker (idle)]:
runtime.gopark(0x4b312c0?, 0x1?, 0x25?, 0x15?, 0x0?)
    /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000070f50 sp=0xc000070f30 pc=0x44162e
runtime.gcBgMarkWorker()
    /usr/local/go/src/runtime/mgc.go:1310 +0xe5 fp=0xc000070fe0 sp=0xc000070f50 pc=0x4235a5
runtime.goexit({})
    /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc000070fe8 sp=0xc000070fe0 pc=0x474301
created by runtime.gcBgMarkStartWorkers in goroutine 1
    /usr/local/go/src/runtime/mgc.go:1234 +0x1c

goroutine 21 gp=0xc000498000 m=nil [GC worker (idle)]:
runtime.gopark(0x1f7736bb36e?, 0x1?, 0x49?, 0x69?, 0x0?)
    /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000071750 sp=0xc000071730 pc=0x44162e
runtime.gcBgMarkWorker()
    /usr/local/go/src/runtime/mgc.go:1310 +0xe5 fp=0xc0000717e0 sp=0xc000071750 pc=0x4235a5
runtime.goexit({})
    /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0000717e8 sp=0xc0000717e0 pc=0x474301
created by runtime.gcBgMarkStartWorkers in goroutine 1
    /usr/local/go/src/runtime/mgc.go:1234 +0x1c

goroutine 5 gp=0xc000007c00 m=nil [GC worker (idle)]:
runtime.gopark(0x1f773719d05?, 0x3?, 0x1d?, 0x53?, 0x0?)
    /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000076750 sp=0xc000076730 pc=0x44162e
runtime.gcBgMarkWorker()
    /usr/local/go/src/runtime/mgc.go:1310 +0xe5 fp=0xc0000767e0 sp=0xc000076750 pc=0x4235a5
runtime.goexit({})
    /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0000767e8 sp=0xc0000767e0 pc=0x474301
created by runtime.gcBgMarkStartWorkers in goroutine 1
    /usr/local/go/src/runtime/mgc.go:1234 +0x1c

goroutine 34 gp=0xc000500000 m=nil [GC worker (idle)]:
runtime.gopark(0x1f773719d05?, 0x1?, 0x9f?, 0x7f?, 0x0?)
    /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000506750 sp=0xc000506730 pc=0x44162e
runtime.gcBgMarkWorker()
    /usr/local/go/src/runtime/mgc.go:1310 +0xe5 fp=0xc0005067e0 sp=0xc000506750 pc=0x4235a5
runtime.goexit({})
    /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0005067e8 sp=0xc0005067e0 pc=0x474301
created by runtime.gcBgMarkStartWorkers in goroutine 1
    /usr/local/go/src/runtime/mgc.go:1234 +0x1c

goroutine 6 gp=0xc000007dc0 m=nil [GC worker (idle)]:
runtime.gopark(0x4b312c0?, 0x1?, 0xa3?, 0x96?, 0x0?)
    /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000076f50 sp=0xc000076f30 pc=0x44162e
runtime.gcBgMarkWorker()
    /usr/local/go/src/runtime/mgc.go:1310 +0xe5 fp=0xc000076fe0 sp=0xc000076f50 pc=0x4235a5
runtime.goexit({})
    /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc000076fe8 sp=0xc000076fe0 pc=0x474301
created by runtime.gcBgMarkStartWorkers in goroutine 1
    /usr/local/go/src/runtime/mgc.go:1234 +0x1c

goroutine 35 gp=0xc0005001c0 m=nil [GC worker (idle)]:
runtime.gopark(0x1f773719d05?, 0x3?, 0xb1?, 0x8?, 0x0?)
    /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000506f50 sp=0xc000506f30 pc=0x44162e
runtime.gcBgMarkWorker()
    /usr/local/go/src/runtime/mgc.go:1310 +0xe5 fp=0xc000506fe0 sp=0xc000506f50 pc=0x4235a5
runtime.goexit({})
    /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc000506fe8 sp=0xc000506fe0 pc=0x474301
created by runtime.gcBgMarkStartWorkers in goroutine 1
    /usr/local/go/src/runtime/mgc.go:1234 +0x1c

goroutine 36 gp=0xc000500380 m=nil [GC worker (idle)]:
runtime.gopark(0x1f773719d05?, 0x1?, 0x69?, 0xed?, 0x0?)
    /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000507750 sp=0xc000507730 pc=0x44162e
runtime.gcBgMarkWorker()
    /usr/local/go/src/runtime/mgc.go:1310 +0xe5 fp=0xc0005077e0 sp=0xc000507750 pc=0x4235a5
runtime.goexit({})
    /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0005077e8 sp=0xc0005077e0 pc=0x474301
created by runtime.gcBgMarkStartWorkers in goroutine 1
    /usr/local/go/src/runtime/mgc.go:1234 +0x1c

goroutine 37 gp=0xc000500540 m=nil [GC worker (idle)]:
runtime.gopark(0x1f773719d05?, 0x3?, 0xd5?, 0x79?, 0x0?)
    /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000507f50 sp=0xc000507f30 pc=0x44162e
runtime.gcBgMarkWorker()
    /usr/local/go/src/runtime/mgc.go:1310 +0xe5 fp=0xc000507fe0 sp=0xc000507f50 pc=0x4235a5
runtime.goexit({})
    /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc000507fe8 sp=0xc000507fe0 pc=0x474301
created by runtime.gcBgMarkStartWorkers in goroutine 1
    /usr/local/go/src/runtime/mgc.go:1234 +0x1c

goroutine 38 gp=0xc000500700 m=nil [GC worker (idle)]:
runtime.gopark(0x1f7736bd12b?, 0x3?, 0x53?, 0x4?, 0x0?)
    /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000508750 sp=0xc000508730 pc=0x44162e
runtime.gcBgMarkWorker()
    /usr/local/go/src/runtime/mgc.go:1310 +0xe5 fp=0xc0005087e0 sp=0xc000508750 pc=0x4235a5
runtime.goexit({})
    /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0005087e8 sp=0xc0005087e0 pc=0x474301
created by runtime.gcBgMarkStartWorkers in goroutine 1
    /usr/local/go/src/runtime/mgc.go:1234 +0x1c

goroutine 50 gp=0xc000580000 m=nil [GC worker (idle)]:
runtime.gopark(0x1f773719d05?, 0x1?, 0x8d?, 0x9?, 0x0?)
    /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000502750 sp=0xc000502730 pc=0x44162e
runtime.gcBgMarkWorker()
    /usr/local/go/src/runtime/mgc.go:1310 +0xe5 fp=0xc0005027e0 sp=0xc000502750 pc=0x4235a5
runtime.goexit({})
    /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0005027e8 sp=0xc0005027e0 pc=0x474301
created by runtime.gcBgMarkStartWorkers in goroutine 1
    /usr/local/go/src/runtime/mgc.go:1234 +0x1c

goroutine 39 gp=0xc0005008c0 m=nil [GC worker (idle)]:
runtime.gopark(0x1f773719d05?, 0x3?, 0x62?, 0x9c?, 0x0?)
    /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000508f50 sp=0xc000508f30 pc=0x44162e
runtime.gcBgMarkWorker()
    /usr/local/go/src/runtime/mgc.go:1310 +0xe5 fp=0xc000508fe0 sp=0xc000508f50 pc=0x4235a5
runtime.goexit({})
    /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc000508fe8 sp=0xc000508fe0 pc=0x474301
created by runtime.gcBgMarkStartWorkers in goroutine 1
    /usr/local/go/src/runtime/mgc.go:1234 +0x1c

goroutine 7 gp=0xc000500c40 m=nil [select, locked to thread]:
runtime.gopark(0xc000505fa8?, 0x2?, 0xc9?, 0x18?, 0xc000505f94?)
    /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000505e38 sp=0xc000505e18 pc=0x44162e
runtime.selectgo(0xc000505fa8, 0xc000505f90, 0x0?, 0x0, 0x0?, 0x1)
    /usr/local/go/src/runtime/select.go:327 +0x725 fp=0xc000505f58 sp=0xc000505e38 pc=0x452a85
runtime.ensureSigM.func1()
    /usr/local/go/src/runtime/signal_unix.go:1034 +0x19f fp=0xc000505fe0 sp=0xc000505f58 pc=0x46b73f
runtime.goexit({})
    /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc000505fe8 sp=0xc000505fe0 pc=0x474301
created by runtime.ensureSigM in goroutine 1
    /usr/local/go/src/runtime/signal_unix.go:1017 +0xc8

goroutine 40 gp=0xc0004981c0 m=4 mp=0xc00007b808 [syscall]:
runtime.notetsleepg(0x4b2ff80, 0xffffffffffffffff)
    /usr/local/go/src/runtime/lock_futex.go:246 +0x29 fp=0xc0005097a0 sp=0xc000509778 pc=0x412ea9
os/signal.signal_recv()
    /usr/local/go/src/runtime/sigqueue.go:152 +0x29 fp=0xc0005097c0 sp=0xc0005097a0 pc=0x470d69
os/signal.loop()
    /usr/local/go/src/os/signal/signal_unix.go:23 +0x13 fp=0xc0005097e0 sp=0xc0005097c0 pc=0x722f73
runtime.goexit({})
    /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc0005097e8 sp=0xc0005097e0 pc=0x474301
created by os/signal.Notify.func1.1 in goroutine 1
    /usr/local/go/src/os/signal/signal.go:151 +0x1f

goroutine 51 gp=0xc0005808c0 m=nil [chan receive]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
    /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc00001a718 sp=0xc00001a6f8 pc=0x44162e
runtime.chanrecv(0xc0001e3860, 0x0, 0x1)
    /usr/local/go/src/runtime/chan.go:583 +0x3bf fp=0xc00001a790 sp=0xc00001a718 pc=0x40cd5f
runtime.chanrecv1(0x0?, 0x0?)
    /usr/local/go/src/runtime/chan.go:442 +0x12 fp=0xc00001a7b8 sp=0xc00001a790 pc=0x40c972
github.com/jmorganca/ollama/server.Serve.func2()
    /home/bunneo/ollama/server/routes.go:1028 +0x25 fp=0xc00001a7e0 sp=0xc00001a7b8 pc=0xe9cfe5
runtime.goexit({})
    /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc00001a7e8 sp=0xc00001a7e0 pc=0x474301
created by github.com/jmorganca/ollama/server.Serve in goroutine 1
    /home/bunneo/ollama/server/routes.go:1027 +0x3f6

goroutine 11 gp=0xc000580e00 m=nil [IO wait]:
runtime.gopark(0x10?, 0x10?, 0xf0?, 0x3d?, 0xb?)
    /usr/local/go/src/runtime/proc.go:402 +0xce fp=0xc000073da8 sp=0xc000073d88 pc=0x44162e
runtime.netpollblock(0x486418?, 0x409ee6?, 0x0?)
    /usr/local/go/src/runtime/netpoll.go:573 +0xf7 fp=0xc000073de0 sp=0xc000073da8 pc=0x43a397
internal/poll.runtime_pollWait(0x7e7f8feabd48, 0x72)
    /usr/local/go/src/runtime/netpoll.go:345 +0x85 fp=0xc000073e00 sp=0xc000073de0 pc=0x46ea05
internal/poll.(*pollDesc).wait(0xc000524680?, 0xc0001af631?, 0x0)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x27 fp=0xc000073e28 sp=0xc000073e00 pc=0x5030c7
internal/poll.(*pollDesc).waitRead(...)
    /usr/local/go/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Read(0xc000524680, {0xc0001af631, 0x1, 0x1})
    /usr/local/go/src/internal/poll/fd_unix.go:164 +0x27a fp=0xc000073ec0 sp=0xc000073e28 pc=0x5043ba
net.(*netFD).Read(0xc000524680, {0xc0001af631?, 0xc000073f48?, 0x470a70?})
    /usr/local/go/src/net/fd_posix.go:55 +0x25 fp=0xc000073f08 sp=0xc000073ec0 pc=0x595c85
net.(*conn).Read(0xc00057c0e8, {0xc0001af631?, 0x0?, 0x4b2f300?})
    /usr/local/go/src/net/net.go:179 +0x45 fp=0xc000073f50 sp=0xc000073f08 pc=0x5a3e85
net.(*TCPConn).Read(0x4a389e0?, {0xc0001af631?, 0x0?, 0x0?})
    <autogenerated>:1 +0x25 fp=0xc000073f80 sp=0xc000073f50 pc=0x5b5505
net/http.(*connReader).backgroundRead(0xc0001af620)
    /usr/local/go/src/net/http/server.go:681 +0x37 fp=0xc000073fc8 sp=0xc000073f80 pc=0x6f3517
net/http.(*connReader).startBackgroundRead.gowrap2()
    /usr/local/go/src/net/http/server.go:677 +0x25 fp=0xc000073fe0 sp=0xc000073fc8 pc=0x6f3445
runtime.goexit({})
    /usr/local/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc000073fe8 sp=0xc000073fe0 pc=0x474301
created by net/http.(*connReader).startBackgroundRead in goroutine 8
    /usr/local/go/src/net/http/server.go:677 +0xba

rax    0x0
rbx    0x7e7f417fa640
rcx    0x7e7f8fc969fc
rdx    0x6
rdi    0x58b1
rsi    0x58b9
rbp    0x58b9
rsp    0x7e7f417f8db0
r8     0x7e7f417f8e80
r9     0x0
r10    0x8
r11    0x246
r12    0x6
r13    0x16
r14    0x2243
r15    0x7e7f0307fa44
rip    0x7e7f8fc969fc
rflags 0x246
cs     0x33
fs     0x0
gs     0x0

robertvazan commented 2 months ago

Last update of the docker image introduced upgrade to ROCm 6.0, which dropped support for gfx900, so now Ryzen 5600G does not work even with HSA_OVERRIDE_GFX_VERSION. AMD screwed us. Last working version is 0.1.27 with ROCm 5.7.

@dhiltgen promised support for multiple ROCm versions. I am looking forward to it.

robertvazan commented 2 months ago

Also looking forward to Vulkan support (#2033, #2578), which looks like a better solution than ROCm.

kirel commented 2 months ago

My AMD Ryzen 7 7840HS w/ Radeon 780M Graphics works great with HSA_OVERRIDE_GFX_VERSION=11.0.0 - I set VRAM to UM_SPECIFIED and 16G (I have 32G of RAM) in the Bios of my minisforum um780xtx mini PC.

DocMAX commented 2 months ago

Yeah, sure that works. But this is about getting Ollama run with UMA memory and "auto" mode in BIOS!

robertvazan commented 2 months ago

@kirel Your iGPU is RDNA3, which is still supported by ROCm. ROCm definitely works, it's just that they deprecate hardware really quickly (my CPU is 6 months old). Vulkan will hopefully provide wider and more long-lived support without any hacks.

taweili commented 2 months ago

Yeah, sure that works. But this is about getting Ollama run with UMA memory and "auto" mode in BIOS!

I managed to get Ollama and llama.cpp to run on 5700G with export HSA_ENABLE_SDMA=0. The performance gain isn't much but I am also looking into the hipHostMalloc hack. You can see more info here.

robertvazan commented 2 months ago

I managed to get Ollama and llama.cpp to run on 5700G with export HSA_ENABLE_SDMA=0.

I can confirm this works with Ryzen 5600G and ROCm 6.0.

It would be ideal to have these overrides stored centrally in ROCm, llama.cpp, or Ollama code.

DocMAX commented 2 months ago

Doesn't work yet: "not enough vram available, falling back to CPU only"

ddpasa commented 2 months ago

vulkan can really help here: https://github.com/ollama/ollama/pull/2578

llama.cpp has some vulkan support, but it's in very early stages. You can try the PR above if that helps.

ollama / ollama

Integrated GPU support #2637