ollama-intel-gpu still restarting

cernyjan commented 1 month ago

Hi, I am fasing this problem after run 'docker-compose -f docker-compose-wsl2.yml up' on Windows notebook with Intel GPU:

Please help, thank you in advance.

cernyjan commented 1 month ago

Problem was in ending and format of the file, after delete of blank ending new line and convert from CLRF into LF, it started to work,. :) FROM Screenshot 2024-09-21 201355 INTO Screenshot 2024-09-21 201419

cernyjan commented 1 month ago

Anyway, chat with ollama does not work in the very end,. :(

mattcurf commented 1 month ago

Unfortunately I also see that Intel integrated graphics for 13th Gen processors and older is not supported. "Intel(R) Iris(R) Xe Graphics. It does appear to work with 14th Gen Processors and its integrated graphics. I can updated the README.md to clarify these findings

eSlider commented 3 weeks ago

Integrated Iris Xe GPU on 12th Gen Intel Core i7-12700H with Alder Lake-P GT2 [Iris Xe Graphics] driver: i915 works fine.

If you run it on linux and have two GPU's like me, to get it works only one GPU could be provided to ollama service at time.

inxi -G

Graphics:
  Device-1: Intel Alder Lake-P GT2 [Iris Xe Graphics] driver: i915 v: kernel
  Device-2: Intel DG2 [Arc A770M] driver: i915 v: kernel

The device should be mapped eplicitly in docker-compose.yml, instead of mapping full /dev/dri

List system devices:

$ lsgpu                      
card1                    Intel Dg2 (Gen12)                 drm:/dev/dri/card1       
└─renderD129                                               drm:/dev/dri/renderD129  
card0                    Intel Alderlake_p (Gen12)         drm:/dev/dri/card0       
└─renderD128                                               drm:/dev/dri/renderD128

Docker composition:

services:
  ollama-intel-gpu:
    build:
      context: .
      dockerfile: Dockerfile
    container_name: ollama-intel-gpu
    image: ollama-intel-gpu:latest
    restart: always
    devices:
      - /dev/dri/renderD129:/dev/dri/renderD129
    volumes:
      - /tmp/.X11-unix:/tmp/.X11-unix
      - ollama-intel-gpu:/root/.ollama

eSlider commented 3 weeks ago

I would like to note that the speed of inference through the integrated GPU, evaluated visually, is less than two times inferior, which is also an excellent result, compared to the speed on a bare CPU.

If you want to use both GPUs at once, this is how it works in parallel:

docker-compose.yml

services:                                                                                     
  ollama-intel-arc-gpu:                                                                       
    build:                                                                                    
      context: .                                                                              
      dockerfile: Dockerfile                                                                  
    container_name: ollama-intel-arc-gpu                                                      
    image: ollama-intel-gpu:latest                                                            
    restart: always                                                                           
    shm_size: "32gb" # <-- not sure, if it helps                                                                 
    #privileged: true # <-- don't do it, otherwise all /dev/dri are submitted to guest                                                                                   
    devices:                                                                                  
      #- /dev/dri:/dev/dri                                                                    
      - /dev/dri/renderD129:/dev/dri/renderD129                                               
      #- /dev/dri/renderD128:/dev/dri/renderD128                                              
    volumes:                                                                                  
      - /tmp/.X11-unix:/tmp/.X11-unix                                                         
      - ollama-intel-gpu:/root/.ollama                                                        
    environment:                                                                              
      - DISPLAY=${DISPLAY}                                                                    
    env_file:                                                                                 
      - .env                                                                                  
  ollama-intel-cpu-gpu:                                                                       
    build:                                                                                    
      context: .                                                                              
      dockerfile: Dockerfile                                                                  
    container_name: ollama-intel-cpu-gpu                                                      
    image: ollama-intel-gpu:latest                                                            
    restart: always                                                                           
    shm_size: "32gb" # <-- not sure, if it helps                        
    #privileged: true   # <-- don't do it, otherwise all /dev/dri are submitted to guest                                                                    
    devices:                                                                                  
      - /dev/dri/renderD128:/dev/dri/renderD128                                               
    volumes:                                                                                  
      - /tmp/.X11-unix:/tmp/.X11-unix                                                         
      - ollama-intel-gpu:/root/.ollama                                                        
    environment:                                                                              
      - DISPLAY=${DISPLAY}                                                                    
    env_file:                                                                                 
      - .env                                                                                  

  ollama-webui:                                                                               
    image: ghcr.io/open-webui/open-webui:v0.3.10                                              
    container_name: ollama-webui                                                              
    volumes:                                                                                  
      - ollama-webui:/app/backend/data                                                        
    depends_on:                                                                               
      - ollama-intel-arc-gpu                                                                  
      - ollama-intel-cpu-gpu                                                                  
    ports:                                                                                    
      - ${OLLAMA_WEBUI_PORT-3000}:8080                                                        
    environment:                                                                              
      - OLLAMA_BASE_URL=http://ollama-intel-arc-gpu:11434;http://ollama-intel-cpu-gpu:11434   
    extra_hosts:                                                                              
      - host.docker.internal:host-gateway                                                     
    restart: unless-stopped                                                                   
volumes:                                                                                      
  ollama-webui: {}                                                                            
  ollama-intel-gpu: {}

BDDwaCT commented 3 weeks ago

I installed(downloaded) llama3.2 1b and I get the Ollama 500: error I would install another if I knew for sure that it would work. I have data cap coming close so all these extra models will add up literally. Any suggestions? Running Intel Arc A770 16gb Driver good etc. Thanks in advance.

eSlider commented 3 weeks ago

@BDDwaCT, yesterday I updated libraries in main. Could you rebuild?

BDDwaCT commented 3 weeks ago

@BDDwaCT, yesterday I updated libraries in main. Could you rebuild?

What time UTC did you do this for I freshly downloaded at approximately 11:00 PM UTC on 10/24/24?

Just curious if you already had completed the update. Also should I try a different library or is it something else in your opinion?

BDDwaCT commented 3 weeks ago

I know I said libraries in my last comment, what I meant was different llm model. Thanks

mattcurf commented 3 weeks ago

I installed(downloaded) llama3.2 1b and I get the Ollama 500: error I would install another if I knew for sure that it would work. I have data cap coming close so all these extra models will add up literally. Any suggestions? Running Intel Arc A770 16gb Driver good etc. Thanks in advance.

I'm reproducing the same issue pulling this repo fresh. In the log I see this error "ollama_llama_server: error while loading shared libraries: libmkl_sycl_blas.so.4: cannot open shared object file: No such file or directory". But in the container I see:

root@10637c90384e:/opt/intel/oneapi# find . -name libmkl_sycl_blas.so*
./2025.0/lib/libmkl_sycl_blas.so
./2025.0/lib/libmkl_sycl_blas.so.5
./mkl/2025.0/lib/libmkl_sycl_blas.so
./mkl/2025.0/lib/libmkl_sycl_blas.so.5

This seems to be llbrary mismatch between ipex-llm and oneAPI

mattcurf commented 3 weeks ago

@BDDwaCT can you try the fix in https://github.com/mattcurf/ollama-intel-gpu/pull/6 and report back if that resolves your issue?

BDDwaCT commented 3 weeks ago

EDIT: Please see my comment in #6 Thanks

I will try here in just a minute however I just wanted to share with you that inside my /onapi# as of right this minute shows this: root@af5042ac6a4d:/opt/intel/oneapi# find . -name libmkl_sycl_blas.so* ./2024.2/lib/libmkl_sycl_blas.so ./2024.2/lib/libmkl_sycl_blas.so.4 ./mkl/2024.2/lib/libmkl_sycl_blas.so ./mkl/2024.2/lib/libmkl_sycl_blas.so.4 root@af5042ac6a4d:/opt/intel/oneapi#

Just FYI. Now I will go and try the fix in #6 mentioned above and will report back. Thanks

mattcurf / ollama-intel-gpu

ollama-intel-gpu still restarting #3

docker-compose.yml