likelovewant / ollama-for-amd

Get up and running with Llama 3, Mistral, Gemma, and other large language models.by adding more amd gpu support.
https://ollama.com
MIT License
118 stars 12 forks source link

6750GRE 12g #14

Open 21307369 opened 1 month ago

21307369 commented 1 month ago

[ollama-for-amd [v0.3.4] OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]" time=2024-08-09T00:25:59.140+08:00 level=INFO source=images.go:782 msg="total blobs: 5" time=2024-08-09T00:25:59.141+08:00 level=INFO source=images.go:790 msg="total unused blobs removed: 0" time=2024-08-09T00:25:59.143+08:00 level=INFO source=routes.go:1158 msg="Listening on 127.0.0.1:11434 (version 0.3.4-0-g26bd110)" time=2024-08-09T00:25:59.144+08:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [rocm_v5.7 cpu cpu_avx cpu_avx2]" time=2024-08-09T00:25:59.144+08:00 level=INFO source=gpu.go:204 msg="looking for compatible GPUs" time=2024-08-09T00:25:59.172+08:00 level=WARN source=amd_windows.go:97 msg="amdgpu is not supported" gpu=0 gpu_type=gfx1031 library=C:\Users\Administrator\AppData\Local\Programs\Ollama\rocm supported_types=[gfx1103] time=2024-08-09T00:25:59.172+08:00 level=WARN source=amd_windows.go:99 msg="See https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md for HSA_OVERRIDE_GFX_VERSION usage" time=2024-08-09T00:25:59.237+08:00 level=INFO source=gpu.go:347 msg="no compatible GPUs were discovered" time=2024-08-09T00:25:59.237+08:00 level=INFO source=types.go:105 msg="inference compute" id=0 library=cpu compute="" driver=0.0 name="" total="31.9 GiB" available="24.9 GiB"

找不到显卡的样子, 缺失 Ollama\rocm\rocblas\library 目录 Kernels.so-000-gfx1031.hsaco TensileLibrary_lazy_gfx1031.dat

Minnow95 commented 1 month ago

I am also using an AMD 6750 GRE 12G. The logs indicate that the GPU is detected, but I encounter error 0xc0000005 when running the llama3:8b and llama3.1:8b models.

2024/08/11 23:00:04 routes.go:1111: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\Users\liaojuncheng\.ollama\models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost: https://localhost: http://127.0.0.1 https://127.0.0.1 http://127.0.0.1: https://127.0.0.1: http://0.0.0.0 https://0.0.0.0 http://0.0.0.0: https://0.0.0.0: app:// file:// tauri://*] OLLAMA_RUNNERS_DIR:C:\Users\liaojuncheng\AppData\Local\Programs\Ollama\ollama_runners OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]" time=2024-08-11T23:00:04.867+08:00 level=INFO source=images.go:782 msg="total blobs: 5" time=2024-08-11T23:00:04.868+08:00 level=INFO source=images.go:790 msg="total unused blobs removed: 0" time=2024-08-11T23:00:04.869+08:00 level=INFO source=routes.go:1158 msg="Listening on 127.0.0.1:11434 (version 0.3.4-0-g26bd110)" time=2024-08-11T23:00:04.869+08:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu_avx cpu_avx2 rocm_v5.7 cpu]" time=2024-08-11T23:00:04.869+08:00 level=INFO source=gpu.go:204 msg="looking for compatible GPUs" time=2024-08-11T23:00:05.223+08:00 level=INFO source=types.go:105 msg="inference compute" id=0 library=rocm compute=gfx1031 driver=5.7 name="AMD Radeon RX 6750 GRE 12GB" total="12.0 GiB" available="11.9 GiB" [GIN] 2024/08/11 - 23:00:05 | 200 | 0s | 127.0.0.1 | HEAD "/" [GIN] 2024/08/11 - 23:00:05 | 200 | 14.8806ms | 127.0.0.1 | POST "/api/show" time=2024-08-11T23:00:05.567+08:00 level=INFO source=sched.go:185 msg="one or more GPUs detected that are unable to accurately report free memory - disabling default concurrency" time=2024-08-11T23:00:05.581+08:00 level=INFO source=sched.go:710 msg="new model will fit in available VRAM in single GPU, loading" model=C:\Users\liaojuncheng.ollama\models\blobs\sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa gpu=0 parallel=4 available=12733906944 required="6.2 GiB" time=2024-08-11T23:00:05.582+08:00 level=INFO source=memory.go:309 msg="offload to rocm" layers.requested=-1 layers.model=33 layers.offload=33 layers.split="" memory.available="[11.9 GiB]" memory.required.full="6.2 GiB" memory.required.partial="6.2 GiB" memory.required.kv="1.0 GiB" memory.required.allocations="[6.2 GiB]" memory.weights.total="4.7 GiB" memory.weights.repeating="4.3 GiB" memory.weights.nonrepeating="411.0 MiB" memory.graph.full="560.0 MiB" memory.graph.partial="677.5 MiB" time=2024-08-11T23:00:05.591+08:00 level=INFO source=server.go:392 msg="starting llama server" cmd="C:\Users\liaojuncheng\AppData\Local\Programs\Ollama\ollama_runners\rocm_v5.7\ollama_llama_server.exe --model C:\Users\liaojuncheng\.ollama\models\blobs\sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa --ctx-size 8192 --batch-size 512 --embedding --log-disable --n-gpu-layers 33 --parallel 4 --port 57494" time=2024-08-11T23:00:05.598+08:00 level=INFO source=sched.go:445 msg="loaded runners" count=1 time=2024-08-11T23:00:05.598+08:00 level=INFO source=server.go:592 msg="waiting for llama runner to start responding" time=2024-08-11T23:00:05.598+08:00 level=INFO source=server.go:626 msg="waiting for server to become available" status="llm server error" INFO [wmain] build info | build=3535 commit="1e6f6554" tid="940" timestamp=1723388405 INFO [wmain] system info | n_threads=6 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="940" timestamp=1723388405 total_threads=12 INFO [wmain] HTTP server listening | hostname="127.0.0.1" n_threads_http="11" port="57494" tid="940" timestamp=1723388405 llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from C:\Users\liaojuncheng.ollama\models\blobs\sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = llama llama_model_loader: - kv 1: general.name str = Meta-Llama-3-8B-Instruct llama_model_loader: - kv 2: llama.block_count u32 = 32 llama_model_loader: - kv 3: llama.context_length u32 = 8192 llama_model_loader: - kv 4: llama.embedding_length u32 = 4096 llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336 llama_model_loader: - kv 6: llama.attention.head_count u32 = 32 llama_model_loader: - kv 7: llama.attention.head_count_kv u32 = 8 llama_model_loader: - kv 8: llama.rope.freq_base f32 = 500000.000000 llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010 llama_model_loader: - kv 10: general.file_type u32 = 2 llama_model_loader: - kv 11: llama.vocab_size u32 = 128256 llama_model_loader: - kv 12: llama.rope.dimension_count u32 = 128 llama_model_loader: - kv 13: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 14: tokenizer.ggml.pre str = llama-bpe llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ... llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 17: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "... llama_model_loader: - kv 18: tokenizer.ggml.bos_token_id u32 = 128000 llama_model_loader: - kv 19: tokenizer.ggml.eos_token_id u32 = 128009 llama_model_loader: - kv 20: tokenizer.chat_template str = {% set loop_messages = messages %}{% ... llama_model_loader: - kv 21: general.quantization_version u32 = 2 llama_model_loader: - type f32: 65 tensors llama_model_loader: - type q4_0: 225 tensors llama_model_loader: - type q6_K: 1 tensors time=2024-08-11T23:00:06.056+08:00 level=INFO source=server.go:626 msg="waiting for server to become available" status="llm server not responding" time=2024-08-11T23:00:08.144+08:00 level=INFO source=server.go:626 msg="waiting for server to become available" status="llm server error" time=2024-08-11T23:00:08.397+08:00 level=ERROR source=sched.go:451 msg="error loading llama server" error="llama runner process has terminated: exit status 0xc0000005" [GIN] 2024/08/11 - 23:00:08 | 500 | 3.1481966s | 127.0.0.1 | POST "/api/chat"

likelovewant commented 1 month ago

ocal\Programs\Ollama\rocm supported_types=[gfx1103] time=2024-08-09T00:25:59.172+08:00 level=WARN source=amd_windows.go:99 msg="See https://github.com/ollama/ollama/blob/main/docs/troubleshooting.md for HSA_OVERRIDE_GFX_VERSION usage" time=2024-08-09T00:25:59.237+08:00 level=INFO source=gpu.go:347 msg="no compatible GPUs were discovered" time=2024-08-09T00:25:59.237+08:00 level=INFO source=types.go:105 msg="inference compute" id=0 library=cpu compute="" driver=0.0 name="" total="31.9 GiB" available="24.9 GiB"

找不到显卡的样子, @21307369 需要根据wiki 页面说明,自行更换对应 rocblas.dll 和 library 。 你这个没有更换。 replace your rocblas libs as guide on the wiki

likelovewant commented 1 month ago

el will fit in available VRAM in single GPU, loading" model=C:\Users\liaojuncheng.ollama\models\blobs\sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa gpu=0 parallel=4 available=

@Minnow95 The line "one or more GPUs detected that are unable to accurately report free memory - disabling default concurrency" That's a issue happends in very earlier version.and has been fixed before 0.2, Try disable your igpu as indicted "one or more gups detacted" if there is any or Test earlier version. possible use different rocmlibs for gfx1031.

koverlu commented 1 week ago

el will fit in available VRAM in single GPU, loading" model=C:\Users\liaojuncheng.ollama\models\blobs\sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa gpu=0 parallel=4 available=

@Minnow95 The line "one or more GPUs detected that are unable to accurately report free memory - disabling default concurrency" That's a issue happends in very earlier version.and has been fixed before 0.2, Try disable your igpu as indicted "one or more gups detacted" if there is any or Test earlier version. possible use different rocmlibs for gfx1031.

I've meet the same problem. On windws, installed HIP SDK 6.12, ollama-for-amd, replace the rocblas lib. I've tried ROCm5.7 versions, same problem.

likelovewant commented 1 week ago

el will fit in available VRAM in single GPU, loading" model=C:\Users\liaojuncheng.ollama\models\blobs\sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa gpu=0 parallel=4 available=

@Minnow95 The line "one or more GPUs detected that are unable to accurately report free memory - disabling default concurrency" That's a issue happends in very earlier version.and has been fixed before 0.2, Try disable your igpu as indicted "one or more gups detacted" if there is any or Test earlier version. possible use different rocmlibs for gfx1031.

I've meet the same problem. On windws, installed HIP SDK 6.12, ollama-for-amd, replace the rocblas lib. I've tried ROCm5.7 versions, same problem.

Addressing ROCm and Ollama Compatibility Issues

From v0.3.8 on this repo , has update to HIP SDK 6.1.2 due to many users mismatch Hip SDK Version.

Important:

Troubleshooting Steps:

  1. GPU Device Number: Run hipinfo in your terminal to determine your GPU device number(s).

  2. Environment Variable: Set the HIP_VISIBLE_DEVICES environment variable to specify which GPU(s) Ollama should use. For example: HIP_VISIBLE_DEVICES=0,1 (adjust the numbers based on your output from hipinfo).

  3. Alternative Installation: For users experiencing persistent issues, consider installing Ollama directly from the official repository .Following the HSA_OVERRIDE_GFX_VERSION instructions in the wiki wiki#demo-release-version This may provide a more stable and compatible environment. set HSA_OVERRIDE_GFX_VERSION=10.1.2 ( change this to match your gpu arches , eg 10.3.1), replace the rocmlibs also .eg gfx1031 run ollama run serve , then open a new terminal ,run ollama run llama3.1 .

@koverlu

koverlu commented 1 week ago
2024/09/07 17:51:43 routes.go:1125: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\\Users\\Administrator\\.ollama\\models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR:C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\runners OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"
time=2024-09-07T17:51:43.020+08:00 level=INFO source=images.go:753 msg="total blobs: 9"
time=2024-09-07T17:51:43.020+08:00 level=INFO source=images.go:760 msg="total unused blobs removed: 0"
time=2024-09-07T17:51:43.021+08:00 level=INFO source=routes.go:1172 msg="Listening on 127.0.0.1:11434 (version 0.3.8-0-g76feb6c)"
time=2024-09-07T17:51:43.021+08:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu cpu_avx cpu_avx2 rocm_v6.1]"
time=2024-09-07T17:51:43.021+08:00 level=INFO source=gpu.go:200 msg="looking for compatible GPUs"
time=2024-09-07T17:51:43.443+08:00 level=INFO source=types.go:107 msg="inference compute" id=0 library=rocm variant="" compute=gfx1031 driver=6.1 name="AMD Radeon RX 6750 GRE 12GB" total="12.0 GiB" available="11.8 GiB"
[GIN] 2024/09/07 - 17:51:43 | 200 |       541.2µs |       127.0.0.1 | HEAD     "/"
[GIN] 2024/09/07 - 17:51:43 | 200 |     20.4052ms |       127.0.0.1 | POST     "/api/show"
time=2024-09-07T17:51:43.860+08:00 level=INFO source=sched.go:185 msg="one or more GPUs detected that are unable to accurately report free memory - disabling default concurrency"
time=2024-09-07T17:51:43.897+08:00 level=INFO source=sched.go:715 msg="new model will fit in available VRAM in single GPU, loading" model=C:\Users\Administrator\.ollama\models\blobs\sha256-732ecb253ea0115453438fc1f4e3e31507719ddcf81890a86ad1d734beefdb6f gpu=0 parallel=4 available=12720537600 required="9.1 GiB"
time=2024-09-07T17:51:43.898+08:00 level=INFO source=memory.go:309 msg="offload to rocm" layers.requested=-1 layers.model=43 layers.offload=43 layers.split="" memory.available="[11.8 GiB]" memory.required.full="9.1 GiB" memory.required.partial="9.1 GiB" memory.required.kv="2.6 GiB" memory.required.allocations="[9.1 GiB]" memory.weights.total="7.3 GiB" memory.weights.repeating="6.6 GiB" memory.weights.nonrepeating="717.8 MiB" memory.graph.full="507.0 MiB" memory.graph.partial="1.2 GiB"
time=2024-09-07T17:51:43.902+08:00 level=INFO source=server.go:391 msg="starting llama server" cmd="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\runners\\rocm_v6.1\\ollama_llama_server.exe --model C:\\Users\\Administrator\\.ollama\\models\\blobs\\sha256-732ecb253ea0115453438fc1f4e3e31507719ddcf81890a86ad1d734beefdb6f --ctx-size 8192 --batch-size 512 --embedding --log-disable --n-gpu-layers 43 --parallel 4 --port 60780"
time=2024-09-07T17:51:43.904+08:00 level=INFO source=sched.go:450 msg="loaded runners" count=1
time=2024-09-07T17:51:43.904+08:00 level=INFO source=server.go:591 msg="waiting for llama runner to start responding"
time=2024-09-07T17:51:43.905+08:00 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server error"
INFO [wmain] build info | build=3535 commit="1e6f6554" tid="15280" timestamp=1725702703
INFO [wmain] system info | n_threads=6 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="15280" timestamp=1725702703 total_threads=12
INFO [wmain] HTTP server listening | hostname="127.0.0.1" n_threads_http="11" port="60780" tid="15280" timestamp=1725702703
llama_model_loader: loaded meta data with 33 key-value pairs and 464 tensors from C:\Users\Administrator\.ollama\models\blobs\sha256-732ecb253ea0115453438fc1f4e3e31507719ddcf81890a86ad1d734beefdb6f (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = gemma2
llama_model_loader: - kv   1:                               general.name str              = Smegmma-Deluxe-9B-v1
llama_model_loader: - kv   2:                      gemma2.context_length u32              = 8192
llama_model_loader: - kv   3:                    gemma2.embedding_length u32              = 3584
llama_model_loader: - kv   4:                         gemma2.block_count u32              = 42
llama_model_loader: - kv   5:                 gemma2.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                gemma2.attention.head_count u32              = 16
llama_model_loader: - kv   7:             gemma2.attention.head_count_kv u32              = 8
llama_model_loader: - kv   8:    gemma2.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv   9:                gemma2.attention.key_length u32              = 256
llama_model_loader: - kv  10:              gemma2.attention.value_length u32              = 256
llama_model_loader: - kv  11:                          general.file_type u32              = 15
llama_model_loader: - kv  12:              gemma2.attn_logit_softcapping f32              = 50.000000
llama_model_loader: - kv  13:             gemma2.final_logit_softcapping f32              = 30.000000
llama_model_loader: - kv  14:            gemma2.attention.sliding_window u32              = 4096
llama_model_loader: - kv  15:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  16:                         tokenizer.ggml.pre str              = default
time=2024-09-07T17:51:44.367+08:00 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server not responding"
time=2024-09-07T17:51:45.564+08:00 level=ERROR source=sched.go:456 msg="error loading llama server" error="llama runner process has terminated: exit status 0xc0000005"
[GIN] 2024/09/07 - 17:51:45 | 500 |    2.0915903s |       127.0.0.1 | POST     "/api/chat"

HIP SDK 6.1.2, ollama-for-amd 0.3.8, replace lib with rocm.gfx1031.for.hip.sdk.6.1.2.optimized.with.little.wu.s.logic.7z set HIP_VISIBLE_DEVICES=0 set HSA_OVERRIDE_GFX_VERSION=10.3.1 Seems get the same problem.

el will fit in available VRAM in single GPU, loading" model=C:\Users\liaojuncheng.ollama\models\blobs\sha256-6a0746a1ec1aef3e7ec53868f220ff6e389f6f8ef87a01d77c96807de94ca2aa gpu=0 parallel=4 available=

@Minnow95 The line "one or more GPUs detected that are unable to accurately report free memory - disabling default concurrency" That's a issue happends in very earlier version.and has been fixed before 0.2, Try disable your igpu as indicted "one or more gups detacted" if there is any or Test earlier version. possible use different rocmlibs for gfx1031.

I've meet the same problem. On windws, installed HIP SDK 6.12, ollama-for-amd, replace the rocblas lib. I've tried ROCm5.7 versions, same problem.

Addressing ROCm and Ollama Compatibility Issues

From v0.3.8 on this repo , has update to HIP SDK 6.1.2 due to many users mismatch Hip SDK Version.

Important:

  • HIP SDK Update: Ollama now uses HIP SDK 6.1.2 to resolve version mismatches with previous releases.
  • ROCm Version Check: If you want to test Ollama with ROCm 5.7 on v0.3.8, ensure you have ROCm 5.7 installed and replace the hipblas.dll file with the one from ROCm 5.7. Failing to do so may result in an "error 0xc0000005" due to memory detection inconsistencies.
  • Driver Update: Always ensure you are using the latest drivers for optimal performance and stability.

Troubleshooting Steps:

  1. GPU Device Number: Run hipinfo in your terminal to determine your GPU device number(s).
  2. Environment Variable: Set the HIP_VISIBLE_DEVICES environment variable to specify which GPU(s) Ollama should use. For example: HIP_VISIBLE_DEVICES=0,1 (adjust the numbers based on your output from hipinfo).
  3. Alternative Installation: For users experiencing persistent issues, consider installing Ollama directly from the official repository .Following the HSA_OVERRIDE_GFX_VERSION instructions in the wiki wiki#demo-release-version This may provide a more stable and compatible environment. set HSA_OVERRIDE_GFX_VERSION=10.1.2 ( change this to match your gpu arches , eg 10.3.1), replace the rocmlibs also .eg gfx1031 run ollama run serve , then open a new terminal ,run ollama run llama3.1 .

@koverlu

likelovewant commented 1 week ago

It's seems this issue happens on limited gpu . you may test this libs rocm.gfx1031.for.hip.sdk.6.1.2.7z set HSA_OVERRIDE_GFX_VERSION=10.3.1 only needed when you using ollama official version and the rocm libs for gfx1031. Also make sure you have Driver Update: Always ensure you are using the latest drivers for optimal performance and stability. or try some small models even it's might not the issue.

@koverlu

Minnow95 commented 1 week ago

@koverlu I replaced the graphics card driver and checked the "Restore Factory Settings" option during installation. It worked successfully. The virtual graphics card driver and version can both have an impact.