likelovewant / ROCmLibs-for-gfx1103-AMD780M-APU

ROCm Library Files for gfx1103 and update with others arches based on AMD GPUs for use in Windows.
GNU General Public License v3.0
136 stars 14 forks source link

llama runner exit status 0xc0000005 on Ryzen 4500U Pro APU #6

Closed EdoaLive closed 1 month ago

EdoaLive commented 2 months ago

I think this is related to #4 but I open a new issue because it seems something is changed after it. I'm using likelovewant/ollama-for-amd 0.3.6 I already tried various combination of the binaries in the 5.7 release but I get always the same result. I also tryed various HSA_OVERRIDE_GFX_VERSION whithout success (basing on suggestions from #4). Also tryed smaller models to see if it's a VRAM size problem, same result. If I undestand correctly, the Ryzen 4500U Pro I have is gfx90c (with xnack-? It depends on the driver?)

Can you help me get this running? Also maybe this is an ollama-for-amd issue?

Thanks

The log:

2024/08/27 16:55:27 routes.go:1125: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\\Users\\eddyx\\.ollama\\models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR:C:\\Users\\eddyx\\AppData\\Local\\Programs\\Ollama\\ollama_runners OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"
time=2024-08-27T16:55:27.317+02:00 level=INFO source=images.go:782 msg="total blobs: 12"
time=2024-08-27T16:55:27.318+02:00 level=INFO source=images.go:790 msg="total unused blobs removed: 0"
time=2024-08-27T16:55:27.319+02:00 level=INFO source=routes.go:1172 msg="Listening on 127.0.0.1:11434 (version 0.3.6-0-g28832df)"
time=2024-08-27T16:55:27.320+02:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 rocm_v5.7]"
time=2024-08-27T16:55:27.320+02:00 level=INFO source=gpu.go:204 msg="looking for compatible GPUs"
time=2024-08-27T16:55:27.810+02:00 level=INFO source=types.go:105 msg="inference compute" id=0 library=rocm compute=gfx90c:xnack- driver=5.7 name="AMD Radeon(TM) Graphics" total="8.5 GiB" available="8.3 GiB"
[GIN] 2024/08/27 - 16:55:32 | 200 |            0s |       127.0.0.1 | HEAD     "/"
[GIN] 2024/08/27 - 16:55:32 | 200 |     34.7028ms |       127.0.0.1 | POST     "/api/show"
time=2024-08-27T16:55:32.991+02:00 level=INFO source=sched.go:185 msg="one or more GPUs detected that are unable to accurately report free memory - disabling default concurrency"
time=2024-08-27T16:55:33.024+02:00 level=INFO source=sched.go:710 msg="new model will fit in available VRAM in single GPU, loading" model=C:\Users\eddyx\.ollama\models\blobs\sha256-c1864a5eb19305c40519da12cc543519e48a0697ecd30e15d5ac228644957d12 gpu=0 parallel=4 available=8931934208 required="2.7 GiB"
time=2024-08-27T16:55:33.024+02:00 level=INFO source=memory.go:309 msg="offload to rocm" layers.requested=-1 layers.model=19 layers.offload=19 layers.split="" memory.available="[8.3 GiB]" memory.required.full="2.7 GiB" memory.required.partial="2.7 GiB" memory.required.kv="144.0 MiB" memory.required.allocations="[2.7 GiB]" memory.weights.total="1.2 GiB" memory.weights.repeating="675.9 MiB" memory.weights.nonrepeating="531.5 MiB" memory.graph.full="504.2 MiB" memory.graph.partial="914.6 MiB"
time=2024-08-27T16:55:33.033+02:00 level=INFO source=server.go:393 msg="starting llama server" cmd="C:\\Users\\eddyx\\AppData\\Local\\Programs\\Ollama\\ollama_runners\\rocm_v5.7\\ollama_llama_server.exe --model C:\\Users\\eddyx\\.ollama\\models\\blobs\\sha256-c1864a5eb19305c40519da12cc543519e48a0697ecd30e15d5ac228644957d12 --ctx-size 8192 --batch-size 512 --embedding --log-disable --n-gpu-layers 19 --parallel 4 --port 61830"
time=2024-08-27T16:55:33.037+02:00 level=INFO source=sched.go:445 msg="loaded runners" count=1
time=2024-08-27T16:55:33.037+02:00 level=INFO source=server.go:593 msg="waiting for llama runner to start responding"
time=2024-08-27T16:55:33.039+02:00 level=INFO source=server.go:627 msg="waiting for server to become available" status="llm server error"
INFO [wmain] build info | build=3535 commit="1e6f6554" tid="13064" timestamp=1724770533
INFO [wmain] system info | n_threads=6 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="13064" timestamp=1724770533 total_threads=12
INFO [wmain] HTTP server listening | hostname="127.0.0.1" n_threads_http="11" port="61830" tid="13064" timestamp=1724770533
llama_model_loader: loaded meta data with 21 key-value pairs and 164 tensors from C:\Users\eddyx\.ollama\models\blobs\sha256-c1864a5eb19305c40519da12cc543519e48a0697ecd30e15d5ac228644957d12 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = gemma
llama_model_loader: - kv   1:                               general.name str              = gemma-2b-it
llama_model_loader: - kv   2:                       gemma.context_length u32              = 8192
llama_model_loader: - kv   3:                          gemma.block_count u32              = 18
llama_model_loader: - kv   4:                     gemma.embedding_length u32              = 2048
llama_model_loader: - kv   5:                  gemma.feed_forward_length u32              = 16384
llama_model_loader: - kv   6:                 gemma.attention.head_count u32              = 8
llama_model_loader: - kv   7:              gemma.attention.head_count_kv u32              = 1
llama_model_loader: - kv   8:                 gemma.attention.key_length u32              = 256
llama_model_loader: - kv   9:               gemma.attention.value_length u32              = 256
llama_model_loader: - kv  10:     gemma.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  11:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  12:                tokenizer.ggml.bos_token_id u32              = 2
llama_model_loader: - kv  13:                tokenizer.ggml.eos_token_id u32              = 1
llama_model_loader: - kv  14:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  15:            tokenizer.ggml.unknown_token_id u32              = 3
llama_model_loader: - kv  16:                      tokenizer.ggml.tokens arr[str,256128]  = ["<pad>", "<eos>", "<bos>", "<unk>", ...
time=2024-08-27T16:55:33.505+02:00 level=INFO source=server.go:627 msg="waiting for server to become available" status="llm server not responding"
time=2024-08-27T16:55:35.175+02:00 level=ERROR source=sched.go:451 msg="error loading llama server" error="llama runner process has terminated: exit status 0xc0000005"
[GIN] 2024/08/27 - 16:55:35 | 500 |    2.6579798s |       127.0.0.1 | POST     "/api/chat"
likelovewant commented 2 months ago

Addressing "llama runner process has terminated: exit status 0xc0000005" Error

While the rocblas.for.gfx90c.workable.7z package (v0.5.7) from this GitHub release with HSA_OVERRIDE_GFX_VERSION settings successfully addresses the 90c solution, a recurring issue is the "llama runner process has terminated: exit status 0xc0000005" error across various machines.

This error likely stems from unknown local configurations or potential mishandling of libraries by Llama. Even using versions between 5.7 and 6.1.2 can trigger this issue.

Possible Solutions:

  1. Try Earlier Llama Versions: Explore versions released before Llama-for-AMD 0.3.6 as they might be more compatible with the current setup.
  2. Awaiting Future Releases: Consider waiting for a future release that incorporates support for the 6.1.2 SDK. This update might resolve the issue directly.
  3. test with ollama official release: with HSA_OVERRIDE_GFX_VERSION=9.1.2 ,and libs for 6.1.2 for 90c .

Debugging Recommendations:

Let me know if you have any further questions or need additional assistance. @EdoaLive

EdoaLive commented 2 months ago

Hi

Awaiting Future Releases: Consider waiting for a future release that incorporates support for the 6.1.2 SDK. This update might resolve the issue directly.

I just tryed your new release of ollama-for-amd (0.3.8) and now it always says "no compatible GPUs were discovered". Tryed of course with 6.1.2 bundles (xnack- and normal). Also tryed HSA_OVERRIDE_GFX_VERSION. It never sees the GPU. BTW I noticed the path for rocblas.dll is changed, should the wiki be update with the new path? Or is it maybe a path issue?

Thanks. I'll also try older ollama-for-amd version in a bit

P.S. Maybe this issue should be moved to ollama-for-amd repo?

likelovewant commented 2 months ago

This question should be the issue for ollama-for-amd .anyway, you can still use this thread even it's not the ideal place for this question. The path should not be the problem. as long as you start server in ollama progarm folder , it should able to start.

Important Notes:

Steps:

  1. Install ROCm 6.1.2: Ensure you have ROCm 6.1.2 installed on your system.
  2. Navigate to the Ollama program folder (e.g., `C:\Users\Names\AppData\Local\Programs\Ollama
  3. Environment Variables:
    • Set HSA_OVERRIDE_GFX_VERSION="9.1.2" in your terminal environment variables.
    • Set HCC_AMDGPU_TARGET="gfx90c" in your terminal environment variables.
    • or Set HCC_AMDGPU_TARGET="gfx90c:xnack-" in your terminal environment variables.
  4. Start Ollama Server: ` and run:
    ./ollama serve

    Troubleshooting:

Further Recommendations:

Remember that this setup is not officially supported by Ollama and may require further adjustments based on your specific arches.

likelovewant commented 2 months ago

Hope this method can help you if you haven't able to run on gpu. https://github.com/likelovewant/ROCmLibs-for-gfx1103-AMD780M-APU/issues/8#issuecomment-2344082165 @EdoaLive

EdoaLive commented 2 months ago

Thanks @likelovewant for tagging me. I tried to shuffle around the DLLs like #8 said, but I continue to get the same result "no compatible GPUs were discovered". Then I tried like suggested there to run directly the runners\rocm_v6.1\ollama_llama_server.exe and it said some DLLs were still missing, so I installed amd HIP 6.1 SDK to get those DLLs. Now running directly in C:\Program Files\AMD\ROCm\6.1\bin, after replacing rocblas.dll and library folder, the runner start but says rocBLAS error: Could not initialize Tensile host: No devices found. When runing ollama serve in that dir, I get an additional error: source=amd_hip_windows.go:103 msg="AMD ROCm reports no devices found" (BTW I updated to ollama-for-amd v0.3.10) The HSA_OVERRIDE_GFX_VERSION and HCC_AMDGPU_TARGET seems to be ignored. All this (not finding the GPU) started happening after switching to ROCm v6.1 which should give best result. With older versions of ollama I think I have even less chances to get it work (?)

Edit: note: if I run amdgpu-arch.exe I get this so I'm quite sure this is my GPU:

C:\Program Files\AMD\ROCm\6.1\bin>amdgpu-arch.exe
gfx90c:xnack-
likelovewant commented 2 months ago

rocBLAS error: Could not initialize Tensile host: No devices found.`` AMD ROCm reports no devices found

make sure get latest drivers

GPU Device Number : Run hipinfo in your terminal to determine your GPU device number(s).

Environment Variable: Set the HIP_VISIBLE_DEVICES environment variable to specify which GPU(s) Ollama should use. For example: HIP_VISIBLE_DEVICES=0,1 (adjust the numbers based on your output from hipinfo).

if not show , try test it with 0,1 or 2.

This can be set by code Set HIP_VISIBLE_DEVICES=0,1 or Hard way as below .

Then run./ollama serve or ./ollama_llama_server.exe

Also you may test set in enviroment as guide https://www.computerhope.com/issues/ch000549.htm creat a new line in system variables , set N: HSA_OVERRIDE_GFX_VERSION and set V : 9.0.12

if all fails . you may test if the same method test works for previous version which support HipSDK 5.7 .eg v0.3.6 or earlier version.

Hope those infos can help you so far .

@EdoaLive

EdoaLive commented 2 months ago

hipinfo also could not find the gpu device:

C:\Program Files\AMD\ROCm\6.1\bin>hipInfo.exe

checkHipErrors() HIP API error = 0100 "no ROCm-capable device is detected" from file <C:\constructicon\builds\gfx\eleven\24.10\drivers\compute\hip-tests\samples\1_Utils\hipInfo\hipInfo.cpp>, line 192.

Are you sure that the package for 90c:xnack- for 6.1.2 actually contains the driver and libraries for that architecture? The AMD GPU drivers I have installed are the latest.

EdoaLive commented 2 months ago

For reference this is hipinfo.exe from HIP SDK 5.7:

C:\Program Files\AMD\ROCm\5.7\bin>hipInfo.exe

--------------------------------------------------------------------------------
device#                           0
Name:                             AMD Radeon(TM) Graphics
pciBusID:                         8
pciDeviceID:                      0
pciDomainID:                      0
multiProcessorCount:              6
maxThreadsPerMultiProcessor:      2560
isMultiGpuBoard:                  0
clockRate:                        1500 Mhz
memoryClockRate:                  1333 Mhz
memoryBusWidth:                   0
totalGlobalMem:                   8.46 GB
totalConstMem:                    2147483647
sharedMemPerBlock:                64.00 KB
canMapHostMemory:                 1
regsPerBlock:                     0
warpSize:                         64
l2CacheSize:                      4194304
computeMode:                      0
maxThreadsPerBlock:               1024
maxThreadsDim.x:                  1024
maxThreadsDim.y:                  1024
maxThreadsDim.z:                  1024
maxGridSize.x:                    2147483647
maxGridSize.y:                    65536
maxGridSize.z:                    65536
major:                            9
minor:                            0
concurrentKernels:                1
cooperativeLaunch:                0
cooperativeMultiDeviceLaunch:     0
isIntegrated:                     0
maxTexture1D:                     16384
maxTexture2D.width:               16384
maxTexture2D.height:              16384
maxTexture3D.width:               2048
maxTexture3D.height:              2048
maxTexture3D.depth:               2048
isLargeBar:                       0
asicRevision:                     0
maxSharedMemoryPerMultiProcessor: 64.00 KB
clockInstructionRate:             1000.00 Mhz
arch.hasGlobalInt32Atomics:       1
arch.hasGlobalFloatAtomicExch:    1
arch.hasSharedInt32Atomics:       1
arch.hasSharedFloatAtomicExch:    1
arch.hasFloatAtomicAdd:           1
arch.hasGlobalInt64Atomics:       1
arch.hasSharedInt64Atomics:       1
arch.hasDoubles:                  1
arch.hasWarpVote:                 1
arch.hasWarpBallot:               1
arch.hasWarpShuffle:              1
arch.hasFunnelShift:              0
arch.hasThreadFenceSystem:        1
arch.hasSyncThreadsExt:           0
arch.hasSurfaceFuncs:             0
arch.has3dGrid:                   1
arch.hasDynamicParallelism:       0
gcnArchName:                      gfx90c:xnack-
peers:
non-peers:                        device#0

memInfo.total:                    8.46 GB
memInfo.free:                     8.32 GB (98%)

This actually works even without rocblas.dll and library folder 🤔

likelovewant commented 2 months ago

hipinfo also could not find the gpu device:

C:\Program Files\AMD\ROCm\6.1\bin>hipInfo.exe

checkHipErrors() HIP API error = 0100 "no ROCm-capable device is detected" from file <C:\constructicon\builds\gfx\eleven\24.10\drivers\compute\hip-tests\samples\1_Utils\hipInfo\hipInfo.cpp>, line 192.

Are you sure that the package for 90c:xnack- for 6.1.2 actually contains the driver and libraries for that architecture? The AMD GPU drivers I have installed are the latest.

There are similiar issue happens on 5.7 https://github.com/ROCm/ROCm/issues/2941 . even may not be the same situation. there could be a way to solve this . not know yet .

I have received feedbacks from others . this hipinfo not able detacted by windows and can be detacted by linux by tweaks some code. therefore I suggest test it by Set HIP_VISIBLE_DEVICES=0,1 . If both no work .

Then you probably need to debug it use previous ollama version which support 5.7 . Once you are able to make it works . you may build ollama from source as guide in wiki .

Alternatively , you may test others LLM Client , eg lmstudio.ai .recently add vulkan support and had most amd gpus support .

EdoaLive commented 1 month ago

Hi @likelovewant, I ended up rebuilding everything on linux. Do you happen to have the file to replace for the rebulid of ROCm libraries for 6.1.2? Or they are the same I can find in the 5.7 release?

Also with the linux driver it (rocminfo) sees the gpu as xnack+. Is it possible to use xnack+ with ollama using the system ram instead of the small VRAM?

Thanks

likelovewant commented 1 month ago

The library is same both for linux and windows . however,there still need another libslibsrocblas.so( name not correct ,something like this ), which is complex . if you need build it , you can find the logic file for building linux version rocblas.dll in logic file for gfx90c ,however it's not recommend . you may use export HSA_OVERRIDE_GFX_VERSION=9.0.0 to make it support . fake you gpu as gfx900 which is support by linux rocm .( try earliar rocm version before 6.2.0 as it drop support also ) ,you may also test for gf908 ( 9.0.8),gf1010 (10.1.0 )also .

if you want to build rocm library from source on linux , then you simply add 90c:xnack+ list with the logic file attached ,should able to build .

if you want to use system ram as your vram , you may test GTT as explained on https://github.com/ollama/ollama/pull/6282 @EdoaLive