Running Docker ollama with Igpu keeps failing to generate.

intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc.

Apache License 2.0

6.55k stars 1.25k forks source link

When I try to run ollama with docker everytime it's asked to generate via API i see

msg="error loading llama server" error="llama runner process has terminated: signal: aborted (core dumped)"

I am running this on Ubuntu 24.04. I know the docker image has Intel packages for oneapi but should the host have them installed as well? I couldn't find docs suggesting that.

Vainfo is the latest offered by apt on my system.

vainfo: VA-API version: 1.20 (libva 2.12.0)
vainfo: Driver version: Intel iHD driver for Intel(R) Gen Graphics - 24.1.0 ()

Docker logs ollama_logs (5).txt

sycl-ls

[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2  [2023.16.12.0.12_195853.xmain-hotfix]
[opencl:cpu:1] Intel(R) OpenCL, 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz OpenCL 3.0 (Build 0) [2023.16.12.0.12_195853.xmain-hotfix]
[opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Iris(R) Xe Graphics OpenCL 3.0 NEO  [23.35.27191.42]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Iris(R) Xe Graphics
1.3 [1.3.26241]

Docker compose - I realized a lot of extra work was needed to figure out how to actually start ollama for docker as the docs showed nothing.

  ollama:
    image: intelanalytics/ipex-llm-inference-cpp-xpu:latest
    container_name: ollama
    restart: unless-stopped
    command: /bin/bash -c "cd /llm/scripts && source ipex-llm-init --gpu --device iGPU && init-ollama && ./ollama serve"
    networks:
      traefik-network:
        ipv4_address: 192.168.10.71
    devices:
      - /dev/dri/renderD128:/dev/dri/renderD128
    volumes:
      - /docker_container_volumes/ollama:/root/.ollama
    environment:
     # - no_proxy=localhost,127.0.0.1
      - bench_model=mistral-7b-v0.1.Q4_0.gguf
      - DEVICE=iGPU
      - OLLAMA_HOST=0.0.0.0
      - OLLAMA_NUM_GPU=999
      - OLLAMA_INTEL_GPU=1
      - GIN_MODE=release
      - NEOReadDebugKeys=1
      - OverrideGpuAddressSpace=48
      - ZES_ENABLE_SYSMAN=1
    shm_size: '16g'
    mem_limit: '8g'
    tty: true
    stdin_open: true

mkdir neo cd neo wget https://github.com/intel/intel-graphics-compiler/releases/download/igc-1.0.15468.11/intel-igc-core_1.0.15468.11_amd64.deb wget https://github.com/intel/intel-graphics-compiler/releases/download/igc-1.0.15468.11/intel-igc-opencl_1.0.15468.11_amd64.deb wget https://github.com/intel/compute-runtime/releases/download/23.43.27642.18/intel-level-zero-gpu-dbgsym_1.3.27642.18_amd64.ddeb wget https://github.com/intel/compute-runtime/releases/download/23.43.27642.18/intel-level-zero-gpu_1.3.27642.18_amd64.deb wget https://github.com/intel/compute-runtime/releases/download/23.43.27642.18/intel-opencl-icd-dbgsym_23.43.27642.18_amd64.ddeb wget https://github.com/intel/compute-runtime/releases/download/23.43.27642.18/intel-opencl-icd_23.43.27642.18_amd64.deb wget https://github.com/intel/compute-runtime/releases/download/23.43.27642.18/libigdgmm12_22.3.11_amd64.deb sudo dpkg -i *.deb

intel-analytics / ipex-llm

Running Docker ollama with Igpu keeps failing to generate. #12116