intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.56k stars 1.25k forks source link

ollama no GPU - Intel Arc A750 Windows and Linux #10823

Open Daroude opened 5 months ago

Daroude commented 5 months ago

ollama doesn't detect the GPU. Neither Windows 11 nor Linux, Ubuntu 23.10 Kernel 6.5 (fresh install).

System: Ryzen 5 3600, 64 GB DDR 4 Ram, Intel ARC A750

followed all the steps and f.e. textegeneration webui works

time=2024-04-21T14:00:41.965+02:00 level=INFO source=gpu.go:121 msg="Detecting GPU type" time=2024-04-21T14:00:41.966+02:00 level=INFO source=gpu.go:268 msg="Searching for GPU management library cudart64_*.dll" time=2024-04-21T14:00:42.056+02:00 level=INFO source=gpu.go:314 msg="Discovered GPU libraries: []" time=2024-04-21T14:00:42.056+02:00 level=INFO source=gpu.go:268 msg="Searching for GPU management library nvml.dll" time=2024-04-21T14:00:42.071+02:00 level=INFO source=gpu.go:314 msg="Discovered GPU libraries: []" time=2024-04-21T14:00:42.071+02:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-04-21T14:00:42.072+02:00 level=INFO source=gpu.go:121 msg="Detecting GPU type" time=2024-04-21T14:00:42.072+02:00 level=INFO source=gpu.go:268 msg="Searching for GPU management library cudart64_*.dll" time=2024-04-21T14:00:42.087+02:00 level=INFO source=gpu.go:314 msg="Discovered GPU libraries: []" time=2024-04-21T14:00:42.087+02:00 level=INFO source=gpu.go:268 msg="Searching for GPU management library nvml.dll" time=2024-04-21T14:00:42.101+02:00 level=INFO source=gpu.go:314 msg="Discovered GPU libraries: []" time=2024-04-21T14:00:42.101+02:00 level=INFO source=cpu_common.go:11 msg="CPU has AVX2" time=2024-04-21T14:00:42.103+02:00 level=INFO source=server.go:125 msg="offload to gpu" reallayers=0 layers=0 required="4576.0 MiB" used="677.5 MiB" available="0 B" kv="256.0 MiB" fulloffload="164.0 MiB" partialoffload="677.5 MiB" time=2024-04-21T14:00:42.112+02:00 level=INFO source=server.go:266 msg="starting llama server" cmd="C:\\Users\\marsc\\AppData\\Local\\Temp\\ollama1126291256\\runners\\cpu_avx2\\ollama_llama_server.exe --model C:\\Users\\marsc\\.ollama\\models\\blobs\\sha256-00e1317cbf74d901080d7100f57580ba8dd8de57203072dc6f668324ba545f29 --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 999 --port 50819" time=2024-04-21T14:00:42.136+02:00 level=INFO source=server.go:397 msg="waiting for llama runner to start responding" time=2024-04-21T14:00:42.198+02:00 level=ERROR source=server.go:285 msg="error starting llama server" server=cpu_avx2 error="llama runner process no longer running: 3221225781 " time=2024-04-21T14:00:42.199+02:00 level=ERROR source=server.go:293 msg="unable to load any llama server" error="llama runner process no longer running: 3221225781 "

sgwhat commented 5 months ago

Hi @Daroude,

I have replicated the issue you're experiencing. Please ensure that you have correctly installed and initialized Intel oneAPI.

For example:

# on windows
call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
Daroude commented 5 months ago

Problem solved, I should have NOT called setvars.bat due to Intel oneAPI being installed via pip. That was an reading error on my side. Thanks for the support.

Below is a required step for APT or offline installed oneAPI. **Skip below step** for PIP-installed oneAPI.