mudler / LocalAI

:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. Features: Generate Text, Audio, Video, Images, Voice Cloning, Distributed, P2P inference
https://localai.io
MIT License
26.62k stars 1.99k forks source link

CLBlast support #404

Closed EchedelleLR closed 11 months ago

EchedelleLR commented 1 year ago

Is your feature request related to a problem? Please describe. No support for OpenCL.

Describe the solution you'd like Implementation of CLBlast to provide support for OpenCL.

Describe alternatives you've considered None, as implementing something for ROCm (if it was something more dedicated) would leave away pre-RDNA graphics cards for AMD, several integrated GPU for Intel, etc.

Additional context Mostly wanted for AMD and Intel iGPUs.

mudler commented 1 year ago

For context: https://github.com/ggerganov/llama.cpp/commit/2e6cd4b02549e343bef3768e6b946f999c82e823

EchedelleLR commented 1 year ago

Is not needed to change https://github.com/go-skynet/LocalAI/blob/master/.env#LL17C1-L18C22 somewhat?

mudler commented 1 year ago

Right, sorry - I need to update the building instructions but will make it part of the website directly. I didn't added the deps as part of the container image, but should be able to build locally from master with:

make BUILD_TYPE=clblast build
EchedelleLR commented 1 year ago

Oki, I will wait for the Docker image changes in deps.

I have a MiniPC with an Intel iGPU supporting OpenCL where I could test directly.

lenaxia commented 1 year ago

I've got clblast building, however it still looks like the GPU is not getting offloaded to.

It is running in Kubernetes with LocalAI v1.17.1

Helm Release: https://github.com/lenaxia/home-ops-prod/blob/7de498d37a6141b45def584f346755c832c0cd57/cluster/apps/home/localai/app/helm-release.yaml

env:

    env:
    - name: THREADS
      value: 8
    - name: CONTEXT_SIZE
      value: 512
    - name: MODELS_PATH
      value: "/models"
    - name: IMAGE_PATH
      value: /tmp
    - name: BUILD_TYPE
      value: clblast
    - name: GO_TAGS
      value: stablediffusion
    - name: DEBUG
      value: "true"

kubectl describe pod

Name:         localai-7c768dbc8c-dn4g9
Namespace:    home
Priority:     0
Node:         k3s-worker-03/192.168.3.23
Start Time:   Sun, 04 Jun 2023 06:51:02 +0000
Labels:       app.kubernetes.io/instance=localai
              app.kubernetes.io/name=localai
              pod-template-hash=7c768dbc8c
Annotations:  <none>
Status:       Running
IP:           10.42.3.130
IPs:
  IP:           10.42.3.130
Controlled By:  ReplicaSet/localai-7c768dbc8c
Init Containers:
  download-model:
    Container ID:  containerd://5084bafa7e285fbd76233661a1d88665e885f29aca053185db7cf494c33660c7
    Image:         busybox@sha256:b5d6fe0712636ceb7430189de28819e195e8966372edfc2d9409d79402a0dc16
    Image ID:      docker.io/library/busybox@sha256:b5d6fe0712636ceb7430189de28819e195e8966372edfc2d9409d79402a0dc16
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/sh
      -c
    Args:
      ## A simpler and more secure way if you have a way of staging an archive with the files you need
      #wget "https://s3.domain.tld/public/stablediffusion.tar" -P /tmp
      #tar -xzvf /tmp/stablediffusion.tar -C $MODELS_PATH
      #rm -rf /tmp/stablediffusion.tar

      ## A more general and less secure way that grab all the files from github
      ## Details here: https://github.com/go-skynet/LocalAI
      ## And here: https://github.com/lenaxia/stablediffusion-bins/releases/tag/2023.05.24
      mkdir $MODELS_PATH/stablediffusion_assets
      wget "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/AutoencoderKL-256-256-fp16-opt.param" -P $MODELS_PATH/stablediffusion_assets
      wget "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/AutoencoderKL-512-512-fp16-opt.param" -P $MODELS_PATH/stablediffusion_assets
      wget "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/AutoencoderKL-base-fp16.param" -P $MODELS_PATH/stablediffusion_assets
      wget "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/FrozenCLIPEmbedder-fp16.param" -P $MODELS_PATH/stablediffusion_assets
      wget "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/UNetModel-256-256-MHA-fp16-opt.param" -P $MODELS_PATH/stablediffusion_assets
      wget "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/UNetModel-512-512-MHA-fp16-opt.param" -P $MODELS_PATH/stablediffusion_assets
      wget "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/UNetModel-base-MHA-fp16.param" -P $MODELS_PATH/stablediffusion_assets
      wget "https://github.com/EdVince/Stable-Diffusion-NCNN/raw/main/x86/linux/assets/log_sigmas.bin" -P $MODELS_PATH/stablediffusion_assets
      wget "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/vocab.txt" -P $MODELS_PATH/stablediffusion_assets
      wget "https://github.com/lenaxia/stablediffusion-bins/releases/download/2023.05.24/UNetModel-MHA-fp16.bin" -P $MODELS_PATH/stablediffusion_assets
      wget "https://github.com/lenaxia/stablediffusion-bins/releases/download/2023.05.24/FrozenCLIPEmbedder-fp16.bin" -P $MODELS_PATH/stablediffusion_assets
      wget "https://github.com/lenaxia/stablediffusion-bins/releases/download/2023.05.24/AutoencoderKL-fp16.bin" -P $MODELS_PATH/stablediffusion_assets
      wget "https://github.com/lenaxia/stablediffusion-bins/releases/download/2023.05.24/AutoencoderKL-encoder-512-512-fp16.bin" -P $MODELS_PATH/stablediffusion_assets

      cat << "EOF" >> $MODELS_PATH/stablediffusion.yaml
      name: stablediffusion
      backend: stablediffusion
      asset_dir: stablediffusion_assets
      EOF

    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Sun, 04 Jun 2023 06:51:03 +0000
      Finished:     Sun, 04 Jun 2023 06:51:08 +0000
    Ready:          True
    Restart Count:  0
    Environment:
      URL:          https://gpt4all.io/models/ggml-gpt4all-j.bin
      MODELS_PATH:  /models
    Mounts:
      /models from models (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jhrl8 (ro)
Containers:
  localai:
    Container ID:   containerd://9537d1c709629992ea54d54be3939845d97eeb9b3ce9c226b26a57399c3a6ff0
    Image:          quay.io/go-skynet/local-ai:v1.17.1
    Image ID:       quay.io/go-skynet/local-ai@sha256:589f2d985aae9baca0813cb3282d2fe4a68d0a4dc2c7f352009f5941fa45c9ed
    Port:           8080/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Sun, 04 Jun 2023 06:57:34 +0000
    Last State:     Terminated
      Reason:       Error
      Exit Code:    137
      Started:      Sun, 04 Jun 2023 06:51:08 +0000
      Finished:     Sun, 04 Jun 2023 06:57:33 +0000
    Ready:          True
    Restart Count:  1
    Limits:
      gpu.intel.com/i915:  1
      memory:              40000Mi
    Requests:
      cpu:                 200m
      gpu.intel.com/i915:  1
      memory:              2000Mi
    Liveness:              http-get http://:8080/healthz delay=240s timeout=1s period=30s #success=1 #failure=4
    Readiness:             http-get http://:8080/readyz delay=240s timeout=1s period=30s #success=1 #failure=4
    Environment:
      THREADS:       8
      CONTEXT_SIZE:  512
      MODELS_PATH:   /models
      IMAGE_PATH:    /tmp
      BUILD_TYPE:    clblast
      GO_TAGS:       stablediffusion
      DEBUG:         true
    Mounts:
      /models from models (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jhrl8 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  models:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  localai-models
    ReadOnly:   false
  kube-api-access-jhrl8:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              node-role.kubernetes.io/worker=true
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 20s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 20s
Events:
  Type     Reason     Age                  From               Message
  ----     ------     ----                 ----               -------
  Normal   Scheduled  16m                  default-scheduler  Successfully assigned home/localai-7c768dbc8c-dn4g9 to k3s-worker-03
  Normal   Pulled     16m                  kubelet            Container image "busybox@sha256:b5d6fe0712636ceb7430189de28819e195e8966372edfc2d9409d79402a0dc16" already present on machine
  Normal   Created    16m                  kubelet            Created container download-model
  Normal   Started    16m                  kubelet            Started container download-model
  Normal   Killing    10m                  kubelet            Container localai failed liveness probe, will be restarted
  Normal   Pulled     9m36s (x2 over 16m)  kubelet            Container image "quay.io/go-skynet/local-ai:v1.17.1" already present on machine
  Normal   Created    9m36s (x2 over 16m)  kubelet            Created container localai
  Normal   Started    9m35s (x2 over 16m)  kubelet            Started container localai
  Warning  Unhealthy  4m6s (x10 over 11m)  kubelet            Readiness probe failed: Get "http://10.42.3.130:8080/readyz": dial tcp 10.42.3.130:8080: connect: connection refused
  Warning  Unhealthy  4m6s (x7 over 11m)   kubelet            Liveness probe failed: Get "http://10.42.3.130:8080/healthz": dial tcp 10.42.3.130:8080: connect: connection refused

Logs:

I local-ai build info:
I BUILD_TYPE: clblast
I GO_TAGS: stablediffusion
CGO_LDFLAGS="" C_INCLUDE_PATH=/build/go-llama:/build/go-stable-diffusion/:/build/gpt4all/gpt4all-bindings/golang/:/build/go-ggml-transformers:/build/go-rwkv:/build/whisper.cpp:/build/go-bert:/build/bloomz LIBRARY_PATH=/build/go-llama:/build/go-stable-diffusion/:/build/gpt4all/gpt4all-bindings/golang/:/build/go-ggml-transformers:/build/go-rwkv:/build/whisper.cpp:/build/go-bert:/build/bloomz go build -ldflags "?=" -tags "stablediffusion" -o local-ai ./
Starting LocalAI using 8 threads, with models path: /models
7:03AM DBG Model: gpt4all-j (config: {OpenAIRequest:{Model:ggml-gpt4all-j File: Language: ResponseFormat: Size: Prompt:<nil> Instruction: Input:<nil> Stop:<nil> Messages:[] Stream:false Echo:false TopP:0.7 TopK:80 Temperature:0.2 Maxtokens:0 N:0 Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 Seed:0 Mode:0 Step:0} Name:gpt4all-j StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:1024 F16:false Threads:0 Debug:false Roles:map[] Embeddings:false Backend:gpt4all-j TemplateConfig:{Completion:gpt4all-completion Chat:gpt4all-chat Edit:} MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 ImageGenerationAssets: PromptCachePath: PromptCacheAll:false PromptStrings:[] InputStrings:[] InputToken:[]})
7:03AM DBG Model: vicuna (config: {OpenAIRequest:{Model:vicuna File: Language: ResponseFormat: Size: Prompt:<nil> Instruction: Input:<nil> Stop:<nil> Messages:[] Stream:false Echo:false TopP:0.7 TopK:80 Temperature:0.2 Maxtokens:0 N:0 Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 Seed:0 Mode:0 Step:0} Name:vicuna StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:1024 F16:false Threads:0 Debug:false Roles:map[] Embeddings:false Backend:llama TemplateConfig:{Completion:vicuna-completion Chat:vicuna-chat Edit:} MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 ImageGenerationAssets: PromptCachePath: PromptCacheAll:false PromptStrings:[] InputStrings:[] InputToken:[]})

 ┌───────────────────────────────────────────────────┐
 │                   Fiber v2.46.0                   │
 │               http://127.0.0.1:8080               │
 │       (bound on host 0.0.0.0 and port 8080)       │
 │                                                   │
 │ Handlers ............ 25  Processes ........... 1 │
 │ Prefork ....... Disabled  PID .............. 8042 │
 └───────────────────────────────────────────────────┘

7:04AM DBG Request received: {"model":"vicuna","file":"","language":"","response_format":"","size":"","prompt":null,"instruction":"","input":null,"stop":null,"messages":[{"role":"user","content":"How are you?"}],"stream":false,"echo":false,"top_p":0,"top_k":0,"temperature":0.9,"max_tokens":0,"n":0,"batch":0,"f16":false,"ignore_eos":false,"repeat_penalty":0,"n_keep":0,"mirostat_eta":0,"mirostat_tau":0,"mirostat":0,"seed":0,"mode":0,"step":0}
7:04AM DBG Parameter Config: &{OpenAIRequest:{Model:vicuna File: Language: ResponseFormat: Size: Prompt:<nil> Instruction: Input:<nil> Stop:<nil> Messages:[] Stream:false Echo:false TopP:0.7 TopK:80 Temperature:0.9 Maxtokens:0 N:0 Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 Seed:0 Mode:0 Step:0} Name:vicuna StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:1024 F16:false Threads:8 Debug:true Roles:map[] Embeddings:false Backend:llama TemplateConfig:{Completion:vicuna-completion Chat:vicuna-chat Edit:} MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 ImageGenerationAssets: PromptCachePath: PromptCacheAll:false PromptStrings:[] InputStrings:[] InputToken:[]}
7:04AM DBG Template found, input modified to: Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
How are you?

### Response:
7:04AM DBG Loading model llama from vicuna
7:04AM DBG Loading model in memory from file: /models/vicuna
llama.cpp: loading model from /models/vicuna
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 1024
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 8 (mostly Q5_0)
llama_model_load_internal: n_ff       = 13824
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size =    0.09 MB
llama_model_load_internal: mem required  = 10583.26 MB (+ 3216.00 MB per state)
.
llama_init_from_file: kv self size  = 1600.00 MB

llama_print_timings:        load time = 92005.19 ms
llama_print_timings:      sample time =    14.75 ms /    17 runs   (    0.87 ms per token)
llama_print_timings: prompt eval time = 16637.22 ms /    39 tokens (  426.60 ms per token)
llama_print_timings:        eval time =  8085.18 ms /    16 runs   (  505.32 ms per token)
llama_print_timings:       total time = 114330.76 ms
7:06AM DBG Response: {"object":"chat.completion","model":"vicuna","choices":[{"message":{"role":"assistant","content":"I am doing well, thank you for asking. And how about you?"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
[10.42.3.8]:51328  200  -  POST     /v1/chat/completions

As an example, looking here: https://youtu.be/tZ8uOHNELIU?t=30, it looks like I'm missing the initialization of CLBlast, and logs reporting the offloading to the GPU.

mudler commented 1 year ago

I think that's a typo in the docs, it should be BUILD_TYPE=clblas .

lenaxia commented 1 year ago

Changing BUILD_TYPE to clblas results in an error:

env:

    env:
    - name: THREADS
      value: 8
    - name: CONTEXT_SIZE
      value: 512
    - name: MODELS_PATH
      value: "/models"
    - name: IMAGE_PATH
      value: /tmp
    - name: BUILD_TYPE
      value: clblas
    - name: GO_TAGS
      value: stablediffusion
    - name: DEBUG
      value: "true"

Error Log:

I local-ai build info:
I BUILD_TYPE: clblas
I GO_TAGS: stablediffusion
CGO_LDFLAGS="-lOpenCL -lclblast" C_INCLUDE_PATH=/build/go-llama:/build/go-stable-diffusion/:/build/gpt4all/gpt4all-bindings/golang/:/build/go-ggml-transformers:/build/go-rwkv:/build/whisper.cpp:/build/go-bert:/build/bloomz LIBRARY_PATH=/build/go-llama:/build/go-stable-diffusion/:/build/gpt4all/gpt4all-bindings/golang/:/build/go-ggml-transformers:/build/go-rwkv:/build/whisper.cpp:/build/go-bert:/build/bloomz go build -ldflags "?=" -tags "stablediffusion" -o local-ai ./
Starting LocalAI using 8 threads, with models path: /models

 ┌───────────────────────────────────────────────────┐
 │                   Fiber v2.46.0                   │
 │               http://127.0.0.1:8080               │
 │       (bound on host 0.0.0.0 and port 8080)       │
 │                                                   │
 │ Handlers ............ 25  Processes ........... 1 │
 │ Prefork ....... Disabled  PID .............. 8029 │
 └───────────────────────────────────────────────────┘

8:41AM DBG Model: stablediffusion (config: {OpenAIRequest:{Model: File: Language: ResponseFormat: Size: Prompt:<nil> Instruction: Input:<nil> Stop:<nil> Messages:[] Stream:false Echo:false TopP:0 TopK:0 Temperature:0 Maxtokens:0 N:0 Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 Seed:0 Mode:0 Step:0} Name:stablediffusion StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:0 F16:false Threads:0 Debug:false Roles:map[] Embeddings:false Backend:stablediffusion TemplateConfig:{Completion: Chat: Edit:} MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 ImageGenerationAssets:stablediffusion_assets PromptCachePath: PromptCacheAll:false PromptStrings:[] InputStrings:[] InputToken:[]})
8:41AM DBG Model: vicuna (config: {OpenAIRequest:{Model:vicuna File: Language: ResponseFormat: Size: Prompt:<nil> Instruction: Input:<nil> Stop:<nil> Messages:[] Stream:false Echo:false TopP:0.7 TopK:80 Temperature:0.2 Maxtokens:0 N:0 Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 Seed:0 Mode:0 Step:0} Name:vicuna StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:1024 F16:false Threads:0 Debug:false Roles:map[] Embeddings:false Backend:llama TemplateConfig:{Completion:vicuna-completion Chat:vicuna-chat Edit:} MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 ImageGenerationAssets: PromptCachePath: PromptCacheAll:false PromptStrings:[] InputStrings:[] InputToken:[]})
[10.42.5.1]:49684  200  -  GET      /readyz
[10.42.5.1]:49670  200  -  GET      /healthz
[10.42.3.8]:37058  200  -  GET      /v1/models
8:42AM DBG Request received: {"model":"vicuna","file":"","language":"","response_format":"","size":"","prompt":null,"instruction":"","input":null,"stop":null,"messages":[{"role":"user","content":"How are you?"}],"stream":false,"echo":false,"top_p":0,"top_k":0,"temperature":0.9,"max_tokens":0,"n":0,"batch":0,"f16":false,"ignore_eos":false,"repeat_penalty":0,"n_keep":0,"mirostat_eta":0,"mirostat_tau":0,"mirostat":0,"seed":0,"mode":0,"step":0}
8:42AM DBG Parameter Config: &{OpenAIRequest:{Model:vicuna File: Language: ResponseFormat: Size: Prompt:<nil> Instruction: Input:<nil> Stop:<nil> Messages:[] Stream:false Echo:false TopP:0.7 TopK:80 Temperature:0.9 Maxtokens:0 N:0 Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 Seed:0 Mode:0 Step:0} Name:vicuna StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:1024 F16:false Threads:8 Debug:true Roles:map[] Embeddings:false Backend:llama TemplateConfig:{Completion:vicuna-completion Chat:vicuna-chat Edit:} MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 ImageGenerationAssets: PromptCachePath: PromptCacheAll:false PromptStrings:[] InputStrings:[] InputToken:[]}
8:42AM DBG Template found, input modified to: Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
How are you?

### Response:
8:42AM DBG Loading model llama from vicuna
8:42AM DBG Loading model in memory from file: /models/vicuna
llama.cpp: loading model from /models/vicuna
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 1024
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 8 (mostly Q5_0)
llama_model_load_internal: n_ff       = 13824
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size =    0.09 MB
ggml_opencl: clGetPlatformIDs(NPLAT, platform_ids, &n_platforms) error -1001 at /build/go-llama/llama.cpp/ggml-opencl.cpp:344

kc describe pod:

Name:         localai-5c9b777d9f-hvmhc
Namespace:    home
Priority:     0
Node:         k3s-worker-00/192.168.3.20
Start Time:   Sun, 04 Jun 2023 08:27:12 +0000
Labels:       app.kubernetes.io/instance=localai
              app.kubernetes.io/name=localai
              pod-template-hash=5c9b777d9f
Annotations:  <none>
Status:       Running
IP:           10.42.5.135
IPs:
  IP:           10.42.5.135
Controlled By:  ReplicaSet/localai-5c9b777d9f
Init Containers:
  download-model:
    Container ID:  containerd://6aac26a3e30bafc894fdd2e32b727a74dc09ae93e7d0af48e16b1fa243a08d2f
    Image:         busybox@sha256:b5d6fe0712636ceb7430189de28819e195e8966372edfc2d9409d79402a0dc16
    Image ID:      docker.io/library/busybox@sha256:b5d6fe0712636ceb7430189de28819e195e8966372edfc2d9409d79402a0dc16
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/sh
      -c
    Args:
      ## A simpler and more secure way if you have a way of staging an archive with the files you need
      #wget "https://s3.domain.tld/public/stablediffusion.tar" -P /tmp
      #tar -xzvf /tmp/stablediffusion.tar -C $MODELS_PATH
      #rm -rf /tmp/stablediffusion.tar

      ## A more general and less secure way that grab all the files from github
      ## Details here: https://github.com/go-skynet/LocalAI
      ## And here: https://github.com/lenaxia/stablediffusion-bins/releases/tag/2023.05.24
      mkdir $MODELS_PATH/stablediffusion_assets
      wget "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/AutoencoderKL-256-256-fp16-opt.param" -P $MODELS_PATH/stablediffusion_assets
      wget "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/AutoencoderKL-512-512-fp16-opt.param" -P $MODELS_PATH/stablediffusion_assets
      wget "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/AutoencoderKL-base-fp16.param" -P $MODELS_PATH/stablediffusion_assets
      wget "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/FrozenCLIPEmbedder-fp16.param" -P $MODELS_PATH/stablediffusion_assets
      wget "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/UNetModel-256-256-MHA-fp16-opt.param" -P $MODELS_PATH/stablediffusion_assets
      wget "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/UNetModel-512-512-MHA-fp16-opt.param" -P $MODELS_PATH/stablediffusion_assets
      wget "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/UNetModel-base-MHA-fp16.param" -P $MODELS_PATH/stablediffusion_assets
      wget "https://github.com/EdVince/Stable-Diffusion-NCNN/raw/main/x86/linux/assets/log_sigmas.bin" -P $MODELS_PATH/stablediffusion_assets
      wget "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/vocab.txt" -P $MODELS_PATH/stablediffusion_assets
      wget "https://github.com/lenaxia/stablediffusion-bins/releases/download/2023.05.24/UNetModel-MHA-fp16.bin" -P $MODELS_PATH/stablediffusion_assets
      wget "https://github.com/lenaxia/stablediffusion-bins/releases/download/2023.05.24/FrozenCLIPEmbedder-fp16.bin" -P $MODELS_PATH/stablediffusion_assets
      wget "https://github.com/lenaxia/stablediffusion-bins/releases/download/2023.05.24/AutoencoderKL-fp16.bin" -P $MODELS_PATH/stablediffusion_assets
      wget "https://github.com/lenaxia/stablediffusion-bins/releases/download/2023.05.24/AutoencoderKL-encoder-512-512-fp16.bin" -P $MODELS_PATH/stablediffusion_assets

      cat << "EOF" >> $MODELS_PATH/stablediffusion.yaml
      name: stablediffusion
      backend: stablediffusion
      asset_dir: stablediffusion_assets
      EOF

    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Sun, 04 Jun 2023 08:27:28 +0000
      Finished:     Sun, 04 Jun 2023 08:28:35 +0000
    Ready:          True
    Restart Count:  0
    Environment:
      URL:          https://gpt4all.io/models/ggml-gpt4all-j.bin
      MODELS_PATH:  /models
    Mounts:
      /models from models (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-rxq5d (ro)
Containers:
  localai:
    Container ID:   containerd://c75d96c86aa0bad4a9cd40da20ce89e46427daeb8eba8618c2a7e3599f3b065b
    Image:          quay.io/go-skynet/local-ai:v1.17.1
    Image ID:       quay.io/go-skynet/local-ai@sha256:589f2d985aae9baca0813cb3282d2fe4a68d0a4dc2c7f352009f5941fa45c9ed
    Port:           8080/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Sun, 04 Jun 2023 08:42:52 +0000
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Sun, 04 Jun 2023 08:37:27 +0000
      Finished:     Sun, 04 Jun 2023 08:42:39 +0000
    Ready:          False
    Restart Count:  2
    Limits:
      gpu.intel.com/i915:  1
      memory:              40000Mi
    Requests:
      cpu:                 200m
      gpu.intel.com/i915:  1
      memory:              2000Mi
    Liveness:              http-get http://:8080/healthz delay=300s timeout=1s period=30s #success=1 #failure=4
    Readiness:             http-get http://:8080/readyz delay=300s timeout=1s period=30s #success=1 #failure=4
    Environment:
      THREADS:       8
      CONTEXT_SIZE:  512
      MODELS_PATH:   /models
      IMAGE_PATH:    /tmp
      BUILD_TYPE:    clblas
      GO_TAGS:       stablediffusion
      DEBUG:         true
    Mounts:
      /models from models (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-rxq5d (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  models:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  localai-models
    ReadOnly:   false
  kube-api-access-rxq5d:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              node-role.kubernetes.io/worker=true
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 20s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 20s
Events:
  Type     Reason                  Age                 From                     Message
  ----     ------                  ----                ----                     -------
  Warning  FailedScheduling        17m                 default-scheduler        0/7 nodes are available: 7 pod has unbound immediate PersistentVolumeClaims. preemption: 0/7 nodes are available: 7 Preemption is not helpful for scheduling.
  Warning  FailedScheduling        17m                 default-scheduler        0/7 nodes are available: 7 pod has unbound immediate PersistentVolumeClaims. preemption: 0/7 nodes are available: 7 Preemption is not helpful for scheduling.
  Normal   Scheduled               17m                 default-scheduler        Successfully assigned home/localai-5c9b777d9f-hvmhc to k3s-worker-00
  Normal   SuccessfulAttachVolume  17m                 attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-97dd0f23-9834-4f4b-9516-65ef13ddafc6"
  Normal   Pulled                  17m                 kubelet                  Container image "busybox@sha256:b5d6fe0712636ceb7430189de28819e195e8966372edfc2d9409d79402a0dc16" already present on machine
  Normal   Created                 17m                 kubelet                  Created container download-model
  Normal   Started                 17m                 kubelet                  Started container download-model
  Warning  BackOff                 2m15s               kubelet                  Back-off restarting failed container
  Normal   Pulled                  2m2s (x3 over 16m)  kubelet                  Container image "quay.io/go-skynet/local-ai:v1.17.1" already present on machine
  Normal   Created                 2m2s (x3 over 16m)  kubelet                  Created container localai
  Normal   Started                 2m2s (x3 over 16m)  kubelet                  Started container localai
mudler commented 1 year ago

If you are running this on Kubernetes, you'd need to expose your dri device to the pod. However I haven't tested this inside kubernetes yet, so can't give you specific instructions to follow here

lenaxia commented 1 year ago

Whelp, I got it working, it requires the installation of opencl drivers. I'l get documentation for it written up so we can add it to the LocalAI.io docs.

I local-ai build info:
I BUILD_TYPE: clblas
I GO_TAGS: stablediffusion
CGO_LDFLAGS="-lOpenCL -lclblast" C_INCLUDE_PATH=/build/go-llama:/build/go-stable-diffusion/:/build/gpt4all/gpt4all-bindings/golang/:/build/go-ggml-transformers:/build/go-rwkv:/build/whisper.cpp:/build/go-bert:/build/bloomz LIBRARY_PATH=/build/go-llama:/build/go-stable-diffusion/:/build/gpt4all/gpt4all-bindings/golang/:/build/go-ggml-transformers:/build/go-rwkv:/build/whisper.cpp:/build/go-bert:/build/bloomz go build -ldflags "?=" -tags "stablediffusion" -o local-ai ./
Starting LocalAI using 8 threads, with models path: /models
6:01PM DBG Model: gpt4all-j (config: {OpenAIRequest:{Model:ggml-gpt4all-j File: Language: ResponseFormat: Size: Prompt:<nil> Instruction: Input:<nil> Stop:<nil> Messages:[] Stream:false Echo:false TopP:0.7 TopK:80 Temperature:0.2 Maxtokens:0 N:0 Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 Seed:0 Mode:0 Step:0} Name:gpt4all-j StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:1024 F16:false Threads:0 Debug:false Roles:map[] Embeddings:false Backend:gpt4all-j TemplateConfig:{Completion:gpt4all-completion Chat:gpt4all-chat Edit:} MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 ImageGenerationAssets: PromptCachePath: PromptCacheAll:false PromptStrings:[] InputStrings:[] InputToken:[]})
6:01PM DBG Model: vicuna (config: {OpenAIRequest:{Model:vicuna File: Language: ResponseFormat: Size: Prompt:<nil> Instruction: Input:<nil> Stop:<nil> Messages:[] Stream:false Echo:false TopP:0.7 TopK:80 Temperature:0.2 Maxtokens:0 N:0 Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 Seed:0 Mode:0 Step:0} Name:vicuna StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:1024 F16:false Threads:0 Debug:false Roles:map[] Embeddings:false Backend:llama TemplateConfig:{Completion:vicuna-completion Chat:vicuna-chat Edit:} MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:32 ImageGenerationAssets: PromptCachePath: PromptCacheAll:false PromptStrings:[] InputStrings:[] InputToken:[]})

 ┌───────────────────────────────────────────────────┐
 │                   Fiber v2.46.0                   │
 │               http://127.0.0.1:8080               │
 │       (bound on host 0.0.0.0 and port 8080)       │
 │                                                   │
 │ Handlers ............ 25  Processes ........... 1 │
 │ Prefork ....... Disabled  PID .............. 8081 │
 └───────────────────────────────────────────────────┘

:0,"n_keep":0,"mirostat_eta":0,"mirostat_tau":0,"mirostat":0,"seed":0,"mode":0,"step":0}
6:10PM DBG Parameter Config: &{OpenAIRequest:{Model:vicuna File: Language: ResponseFormat: Size: Prompt:<nil> Instruction: Input:<nil> Stop:<nil> Messages:[] Stream:false Echo:false TopP:0.7 TopK:80 Temperature:0.9 Maxtokens:0 N:0 Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 Seed:0 Mode:0 Step:0} Name:vicuna StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:1024 F16:false Threads:8 Debug:true Roles:map[] Embeddings:false Backend:llama TemplateConfig:{Completion:vicuna-completion Chat:vicuna-chat Edit:} MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:32 ImageGenerationAssets: PromptCachePath: PromptCacheAll:false PromptStrings:[] InputStrings:[] InputToken:[]}
6:10PM DBG Template found, input modified to: Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
How are you?

### Response:
6:10PM DBG Loading model llama from vicuna
6:10PM DBG Loading model in memory from file: /models/vicuna
llama.cpp: loading model from /models/vicuna
llama_model_load_internal: format     = ggjt v3 (latest)
llama_model_load_internal: n_vocab    = 32000
llama_model_load_internal: n_ctx      = 1024
llama_model_load_internal: n_embd     = 5120
llama_model_load_internal: n_mult     = 256
llama_model_load_internal: n_head     = 40
llama_model_load_internal: n_layer    = 40
llama_model_load_internal: n_rot      = 128
llama_model_load_internal: ftype      = 8 (mostly Q5_0)
llama_model_load_internal: n_ff       = 13824
llama_model_load_internal: n_parts    = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size =    0.09 MB
ggml_opencl: selecting platform: 'Intel(R) OpenCL HD Graphics'
ggml_opencl: selecting device: 'Intel(R) HD Graphics 530'
ggml_opencl: device FP16 support: true
llama_model_load_internal: mem required  = 10583.26 MB (+ 3216.00 MB per state)
ggml_opencl: offloading 32 layers to GPU
ggml_opencl: total VRAM used: 6655 MB
.
llama_init_from_file: kv self size  = 1600.00 MB

llama_print_timings:        load time = 55941.34 ms
llama_print_timings:      sample time =    16.78 ms /    17 runs   (    0.99 ms per token)
llama_print_timings: prompt eval time = 255453.86 ms /    39 tokens ( 6550.10 ms per token)
llama_print_timings:        eval time = 28250.62 ms /    16 runs   ( 1765.66 ms per token)
llama_print_timings:       total time = 289040.55 ms
6:14PM DBG Response: {"object":"chat.completion","model":"vicuna","choices":[{"message":{"role":"assistant","content":"I am doing well, thank you for asking. And how about you?"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
[10.42.3.8]:51328  200  -  POST     /v1/chat/completions

But seriously, this intel iGPU sucks. So much slower compared to even just the CPU.

4 threads, openblas

llama_print_timings:        load time = 88639.48 ms
llama_print_timings:      sample time =    12.25 ms /    17 runs   (    0.72 ms per token)
llama_print_timings: prompt eval time = 13018.65 ms /    39 tokens (  333.81 ms per token)
llama_print_timings:        eval time =  7248.51 ms /    16 runs   (  453.03 ms per token)
llama_print_timings:       total time = 48560.37 ms
localai-bot commented 1 year ago

:warning::warning::warning::warning::warning:

Hi! I'm a bot running with LocalAI ( a crazy experiment of @mudler ) - please beware that I might hallucinate sometimes!

_but.... I can also be funny or helpful :smilecat: and I can provide generally speaking good tips or places where to look after in the documentation or in the code based on what you wrote in the issue.

Don't engage in conversation with me, I don't support (yet) replying!

:warning::warning::warning::warning::warning:

To provide support for OpenCL, you can build LocalAI with BUILD_TYPE=clblas. This will use CLBlast to provide OpenCL GPU acceleration. You will also need to specify a CLBlast directory using the environment variable CLBLAST_DIR. Here is an example command to build LocalAI with CLBlast support:

export BUILD_TYPE=clblas
export CLBLAST_DIR=/path/to/clblast
make BUILD_TYPE=clblas build

Note that you will need to have OpenCL installed and set up on your system before building LocalAI. Additionally, if you want to use the pre-configured models feature, you can specify a list of models in the environment variable PRELOAD_MODELS. Each model should have a URL to its YAML configuration file and a name to use for the model. For example:

export PRELOAD_MODELS=["{\"url\": \"github:go-skynet/model-gallery/gpt4all-j.yaml\", \"name\": \"gpt-3.5-turbo\"}"]

This will automatically download and load the gpt4all-j model as "gpt-3.5-turbo" when starting LocalAI.

Sources:

sammcj commented 1 year ago

I think it would be good if the instructions around getting CLBLAST to work were updated.

So far on Fedora I can't get LocalAI with CLBlast to build as the make fails with ld complaining about missing libraries - despite the build taking place within the official docker container and with CLBLAST installed, mounted as a volume and with CLBLAST_DIR set.

...
CGO_LDFLAGS="-lOpenCL -lclblast" C_INCLUDE_PATH=/build/go-ggml-transformers LIBRARY_PATH=/build/go-ggml-transformers \
go build -ldflags "-X "github.com/go-skynet/LocalAI/internal.Version=v1.22.0-10-g0883d32" -X "github.com/go-skynet/LocalAI/internal.Commit=0883d324d9b29b12e8417aa20d6458a77f62aab1"" -tags "" -o backend-assets/grpc/falcon-ggml ./cmd/grpc/falcon-ggml/
# github.com/go-skynet/LocalAI/cmd/grpc/falcon-ggml
/usr/local/go/pkg/tool/linux_amd64/link: running g++ failed: exit status 1
/usr/bin/ld: cannot find -lOpenCL
/usr/bin/ld: cannot find -lclblast
collect2: error: ld returned 1 exit status

make: *** [Makefile:387: backend-assets/grpc/falcon-ggml] Error 1

env:

BUILD_TYPE=clblas
CLBLAST_DIR=/mnt/clblast

docker-compose:

  localai:
    image: quay.io/go-skynet/local-ai:master
...
    build:
      context: ${MOUNT_DOCKER_DATA}/LocalAI/git/
      dockerfile: Dockerfile
    ports:
      - 8888:8080
    env_file:
      - localai/env
    volumes:
      - ${MOUNT_DOCKER_DATA}/LocalAI/models:/models
      - ${MOUNT_DOCKER_DATA}/LocalAI/clblast:/mnt/clblast
...

clblast:

/opt/docker-data/LocalAI/clblast

ls -1

bin/
include/
lib/

ls -1 bin/

clblast_sample_cache_c*
clblast_sample_dgemv_c*
clblast_sample_dtrsm*
clblast_sample_haxpy_c*
clblast_sample_samax_c*
clblast_sample_sasum_c*
clblast_sample_sgemm*
clblast_sample_sgemm_batched*
clblast_sample_sgemm_c*
clblast_sample_tuning_api*
clblast_tuner_copy_fast*
clblast_tuner_copy_pad*
clblast_tuner_invert*
clblast_tuner_routine_xgemm*
clblast_tuner_routine_xtrsv*
clblast_tuner_transpose_fast*
clblast_tuner_transpose_pad*
clblast_tuner_xaxpy*
clblast_tuner_xconvgemm*
clblast_tuner_xdot*
clblast_tuner_xgemm*
clblast_tuner_xgemm_direct*
clblast_tuner_xgemv*
clblast_tuner_xger*

ls -1 lib/

cmake/
libclblast.so -> libclblast.so.1
libclblast.so.1 -> libclblast.so.1.6.1
libclblast.so.1.6.1
pkgconfig/
elementalest commented 11 months ago

I have run into the exact same issue, but using make manually on a Ubuntu 22.04 system (no docker). I used the following steps (following https://localai.io/basics/build/index.html where applicable):

mkdir -p ~/src && cd ~/src

INSTALL_PREFIX=~/src/install
mkdir -p ${INSTALL_PREFIX}

git clone --recurse-submodules https://github.com/KhronosGroup/OpenCL-SDK.git

mkdir -p OpenCL-SDK/build
cd OpenCL-SDK/build

cmake .. -DBUILD_DOCS=OFF -DBUILD_EXAMPLES=OFF -DBUILD_TESTING=OFF -DOPENCL_SDK_BUILD_SAMPLES=OFF -DOPENCL_SDK_TEST_SAMPLES=OFF
cmake --build . --config Release -j 7
cmake --install . --prefix ${INSTALL_PREFIX}

cd ~/src

git clone https://github.com/CNugteren/CLBlast.git

mkdir -p CLBlast/build
cd CLBlast/build

cmake .. -DBUILD_SHARED_LIBS=OFF -DTUNERS=OFF -DOPENCL_ROOT=${INSTALL_PREFIX}
cmake --build . --config Release -j 7
cmake --install . --prefix ${INSTALL_PREFIX}

cd ~/src

git clone --recurse-submodules -j8 https://github.com/mudler/LocalAI.git
cd LocalAI

CMAKE_ARGS="-DLLAMA_F16C=OFF -DLLAMA_AVX512=OFF -DLLAMA_AVX2=OFF -DLLAMA_FMA=OFF" CLBLAST_DIR=${INSTALL_PREFIX}/lib/cmake/CLBlast/ make BUILD_TYPE=clblas GO_TAGS=tts build

The full log is here, but fails at the same point as @sammcj

I also tried building with the following (making sure to make clean first):

CMAKE_ARGS="-DLLAMA_F16C=OFF -DLLAMA_AVX512=OFF -DLLAMA_AVX2=OFF -DLLAMA_FMA=OFF -DLLAMA_CLBLAST=ON -DCLBLAST_DIR=${INSTALL_PREFIX}/lib/cmake/CLBlast/" CLBLAST_DIR=${INSTALL_PREFIX}/lib make BUILD_TYPE=clblas GO_TAGS=tts build

And also tried with CLBLAST_DIR=${INSTALL_PREFIX}/lib/cmake/CLBlast/

~/src/install/lib$ ls -1
cmake/
libclblast.a
libOpenCLExt.a
libOpenCL.so
libOpenCL.so.1
libOpenCL.so.1.2
libOpenCLUtils.a
libOpenCLUtilsCpp.a
pkgconfig/

I also tried removing falcon-ggml from the makefile to see if I could work around it, but still ran into linker issues with OpenCL and clblast with the next target in the makefile:

...
CGO_LDFLAGS=" -lOpenCL -lclblast" C_INCLUDE_PATH=~/src/LocalAI/sources/go-bert LIBRARY_PATH=~/src/LocalAI/sources/go-bert \
go build -ldflags " -X "github.com/go-skynet/LocalAI/internal.Version=v2.0.0-16-g89ff123" -X "github.com/go-skynet/LocalAI/internal.Commit=89ff12309daac0a9a4f6e85b7cfc3833995d4e82"" -tags "tts" -o backend-assets/grpc/bert-embeddings ./backend/go/llm/bert/
# github.com/go-skynet/go-bert.cpp
# github.com/go-skynet/LocalAI/backend/go/llm/bert
/usr/local/go/pkg/tool/linux_amd64/link: running g++ failed: exit status 1
/usr/bin/ld: cannot find -lOpenCL
/usr/bin/ld: cannot find -lclblast
/usr/bin/ld: cannot find -lOpenCL
/usr/bin/ld: cannot find -lclblast
/usr/bin/ld: cannot find -lOpenCL
/usr/bin/ld: cannot find -lclblast
/usr/bin/ld: cannot find -lOpenCL
/usr/bin/ld: cannot find -lclblast
collect2: error: ld returned 1 exit status

make: *** [Makefile:512: backend-assets/grpc/bert-embeddings] Error 1

EDIT: I am able to build and run llama.cpp with CLBlast without issue.