Open jaymeanchante opened 3 months ago
Hi @jaymeanchante, we have reproduced your issue and we are woking on resolving it, will inform you when we make progress.
Hi @jaymeanchante, I can run ollama on windows with Intel Iris Xe (GPU driver 5534) successfully now, the reason I was able to reproduce your issue is that the GPU driver was not installed correctly, you may verify the env and run ollama as steps below:
ls-sycl-device.exe
to check your sycl devices, it's expected to get the results as below (it would be helpful for me to address this issue if you could provide the output).found 3 SYCL devices:
|ID| Device Type| Name|Version|units |group |group|size | Driver version|
|--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------|
| 0| [level_zero:gpu:0]| Intel Iris Xe Graphics| 1.3| 96| 512| 32| 7445M| 1.3.29283|
| 1| [opencl:gpu:0]| Intel Iris Xe Graphics| 3.0| 96| 512| 32| 7445M| 31.0.101.5534|
| 2| [opencl:cpu:0]|11th Gen Intel Core i7-1185G7 @ 3.00GHz| 3.0| 8| 8192| 64| -|-|
Note: Please ensure that you are running ollama serve
in your llm-cpp
conda environment.
Hi @jaymeanchante, I can run ollama on windows with Intel Iris Xe (GPU driver 5534) successfully now, the reason I was able to reproduce your issue is that the GPU driver was not installed correctly, you may verify the env and run ollama as steps below:
- Run
ls-sycl-device.exe
to check your sycl devices, it's expected to get the results as below (it would be helpful for me to address this issue if you could provide the output).ls-sycl-device.exe found 2 SYCL devices: | | | | |Max | |Max |Global | | | | | | |compute|Max work|sub |mem | | |ID| Device Type| Name|Version|units |group |group|size | Driver version| |--|-------------------|---------------------------------------|-------|-------|--------|-----|-------|---------------------| | 0| [level_zero:gpu:0]| Intel Arc A770 Graphics| 1.3| 512| 1024| 32| 16704M| 1.3.29516| | 1| [opencl:gpu:0]| Intel Arc A770 Graphics| 3.0| 512| 1024| 32| 16704M| 31.0.101.5590|
That's the output of mine, still runs on CPU no matter what I do
That's the output of mine, still runs on CPU no matter what I do
Hi @opticblu,
Hello, first of all thanks for the amazing project.
I was able to run ipex with llama.cpp, it all worked fine, I was able to run on both CPU and GPU very fast. However, it didn't work for ollama.
Device: Samsung Book 3 360 OS: Windows 11 GPU: Iris Xe GPU Driver: 31.0.101.5534
I followed the step by step
conda create -n llm-cpp python=3.11 conda activate llm-cpp pip install --pre --upgrade ipex-llm[cpp]
mkdir ipex-ollama cd ipex-ollama
with admin priviledge
init-ollama.bat
it successfully creates the symlinks
set OLLAMA_NUM_GPU=999 set no_proxy=localhost,127.0.0.1 set ZES_ENABLE_SYSMAN=1 set SYCL_CACHE_PERSISTENT=1
in the tab of powershell I run
.\ollama.exe serve
in the other tab I run successfully
.\ollama.exe -v .\ollama.exe help .\ollama.exe pull phi3
I visited http://localhost:11434/ and it says "Ollama is running"
however if I run
.\ollama.exe run phi3
I get the following:
Error: llama runner process has terminated: exit status 0xc0000135
in the server I see
[GIN] 2024/06/09 - 20:46:01 | 200 | 0s | 127.0.0.1 | HEAD "/" [GIN] 2024/06/09 - 20:46:01 | 200 | 505.3µs | 127.0.0.1 | POST "/api/show" [GIN] 2024/06/09 - 20:46:01 | 200 | 563.5µs | 127.0.0.1 | POST "/api/show" time=2024-06-09T20:46:02.141+02:00 level=INFO source=memory.go:133 msg="offload to gpu" layers.requested=-1 layers.real=33 memory.available="3.9 GiB" memory.required.full="3.1 GiB" memory.required.partial="3.1 GiB" memory.required.kv="768.0 MiB" memory.weights.total="2.2 GiB" memory.weights.repeating="2.1 GiB" memory.weights.nonrepeating="77.1 MiB" memory.graph.full="128.0 MiB" memory.graph.partial="128.0 MiB" time=2024-06-09T20:46:02.147+02:00 level=INFO source=server.go:342 msg="starting llama server" cmd="C:\Users\jayme\repos\ipex-ollama\dist\windows-amd64\ollama_runners\cpu_avx2\ollama_llama_server.exe --model C:\Users\jayme\.ollama\models\blobs\sha256-b26e6713dc749dda35872713fa19a568040f475cc71cb132cff332fe7e216462 --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 999 --parallel 1 --port 53957" time=2024-06-09T20:46:02.150+02:00 level=INFO source=sched.go:338 msg="loaded runners" count=1 time=2024-06-09T20:46:02.151+02:00 level=INFO source=server.go:529 msg="waiting for llama runner to start responding" time=2024-06-09T20:46:02.151+02:00 level=INFO source=server.go:566 msg="waiting for server to become available" status="llm server error" time=2024-06-09T20:46:02.409+02:00 level=ERROR source=sched.go:344 msg="error loading llama server" error="llama runner process has terminated: exit status 0xc0000135 " [GIN] 2024/06/09 - 20:46:02 | 500 | 491.9745ms | 127.0.0.1 | POST "/api/chat"