intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.55k stars 1.25k forks source link

Running Docker ollama with Igpu keeps failing to generate. #12116

Open Daniel-dev22 opened 6 days ago

Daniel-dev22 commented 6 days ago

When I try to run ollama with docker everytime it's asked to generate via API i see

msg="error loading llama server" error="llama runner process has terminated: signal: aborted (core dumped)"

I am running this on Ubuntu 24.04. I know the docker image has Intel packages for oneapi but should the host have them installed as well? I couldn't find docs suggesting that.

Vainfo is the latest offered by apt on my system.

vainfo: VA-API version: 1.20 (libva 2.12.0)
vainfo: Driver version: Intel iHD driver for Intel(R) Gen Graphics - 24.1.0 ()

Docker logs ollama_logs (5).txt

sycl-ls

[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2  [2023.16.12.0.12_195853.xmain-hotfix]
[opencl:cpu:1] Intel(R) OpenCL, 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz OpenCL 3.0 (Build 0) [2023.16.12.0.12_195853.xmain-hotfix]
[opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Iris(R) Xe Graphics OpenCL 3.0 NEO  [23.35.27191.42]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Iris(R) Xe Graphics
1.3 [1.3.26241]

Docker compose - I realized a lot of extra work was needed to figure out how to actually start ollama for docker as the docs showed nothing.

  ollama:
    image: intelanalytics/ipex-llm-inference-cpp-xpu:latest
    container_name: ollama
    restart: unless-stopped
    command: /bin/bash -c "cd /llm/scripts && source ipex-llm-init --gpu --device iGPU && init-ollama && ./ollama serve"
    networks:
      traefik-network:
        ipv4_address: 192.168.10.71
    devices:
      - /dev/dri/renderD128:/dev/dri/renderD128
    volumes:
      - /docker_container_volumes/ollama:/root/.ollama
    environment:
     # - no_proxy=localhost,127.0.0.1
      - bench_model=mistral-7b-v0.1.Q4_0.gguf
      - DEVICE=iGPU
      - OLLAMA_HOST=0.0.0.0
      - OLLAMA_NUM_GPU=999
      - OLLAMA_INTEL_GPU=1
      - GIN_MODE=release
      - NEOReadDebugKeys=1
      - OverrideGpuAddressSpace=48
      - ZES_ENABLE_SYSMAN=1
    shm_size: '16g'
    mem_limit: '8g'
    tty: true
    stdin_open: true
JinheTang commented 4 days ago

Hi @Daniel-dev22 , the latest version of ipex-llm[cpp] seems to have this problem on Linux. You can downgrade ipex-llm[cpp] in the container to the verified working version with:

pip install ipex-llm[cpp]==2.2.0b20240911

Additionally, if you meet garbage output after the downgrade, you can run the following command in container to update intel-level-zero-gpu:

mkdir neo
cd neo
wget https://github.com/intel/intel-graphics-compiler/releases/download/igc-1.0.15468.11/intel-igc-core_1.0.15468.11_amd64.deb
wget https://github.com/intel/intel-graphics-compiler/releases/download/igc-1.0.15468.11/intel-igc-opencl_1.0.15468.11_amd64.deb
wget https://github.com/intel/compute-runtime/releases/download/23.43.27642.18/intel-level-zero-gpu-dbgsym_1.3.27642.18_amd64.ddeb
wget https://github.com/intel/compute-runtime/releases/download/23.43.27642.18/intel-level-zero-gpu_1.3.27642.18_amd64.deb
wget https://github.com/intel/compute-runtime/releases/download/23.43.27642.18/intel-opencl-icd-dbgsym_23.43.27642.18_amd64.ddeb
wget https://github.com/intel/compute-runtime/releases/download/23.43.27642.18/intel-opencl-icd_23.43.27642.18_amd64.deb
wget https://github.com/intel/compute-runtime/releases/download/23.43.27642.18/libigdgmm12_22.3.11_amd64.deb
sudo dpkg -i *.deb

Thank you for pointing it out. We will improve it over time :)