Closed EchedelleLR closed 11 months ago
Is not needed to change https://github.com/go-skynet/LocalAI/blob/master/.env#LL17C1-L18C22 somewhat?
Right, sorry - I need to update the building instructions but will make it part of the website directly. I didn't added the deps as part of the container image, but should be able to build locally from master with:
make BUILD_TYPE=clblast build
Oki, I will wait for the Docker image changes in deps.
I have a MiniPC with an Intel iGPU supporting OpenCL where I could test directly.
I've got clblast building, however it still looks like the GPU is not getting offloaded to.
It is running in Kubernetes with LocalAI v1.17.1
env:
env:
- name: THREADS
value: 8
- name: CONTEXT_SIZE
value: 512
- name: MODELS_PATH
value: "/models"
- name: IMAGE_PATH
value: /tmp
- name: BUILD_TYPE
value: clblast
- name: GO_TAGS
value: stablediffusion
- name: DEBUG
value: "true"
kubectl describe pod
Name: localai-7c768dbc8c-dn4g9
Namespace: home
Priority: 0
Node: k3s-worker-03/192.168.3.23
Start Time: Sun, 04 Jun 2023 06:51:02 +0000
Labels: app.kubernetes.io/instance=localai
app.kubernetes.io/name=localai
pod-template-hash=7c768dbc8c
Annotations: <none>
Status: Running
IP: 10.42.3.130
IPs:
IP: 10.42.3.130
Controlled By: ReplicaSet/localai-7c768dbc8c
Init Containers:
download-model:
Container ID: containerd://5084bafa7e285fbd76233661a1d88665e885f29aca053185db7cf494c33660c7
Image: busybox@sha256:b5d6fe0712636ceb7430189de28819e195e8966372edfc2d9409d79402a0dc16
Image ID: docker.io/library/busybox@sha256:b5d6fe0712636ceb7430189de28819e195e8966372edfc2d9409d79402a0dc16
Port: <none>
Host Port: <none>
Command:
/bin/sh
-c
Args:
## A simpler and more secure way if you have a way of staging an archive with the files you need
#wget "https://s3.domain.tld/public/stablediffusion.tar" -P /tmp
#tar -xzvf /tmp/stablediffusion.tar -C $MODELS_PATH
#rm -rf /tmp/stablediffusion.tar
## A more general and less secure way that grab all the files from github
## Details here: https://github.com/go-skynet/LocalAI
## And here: https://github.com/lenaxia/stablediffusion-bins/releases/tag/2023.05.24
mkdir $MODELS_PATH/stablediffusion_assets
wget "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/AutoencoderKL-256-256-fp16-opt.param" -P $MODELS_PATH/stablediffusion_assets
wget "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/AutoencoderKL-512-512-fp16-opt.param" -P $MODELS_PATH/stablediffusion_assets
wget "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/AutoencoderKL-base-fp16.param" -P $MODELS_PATH/stablediffusion_assets
wget "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/FrozenCLIPEmbedder-fp16.param" -P $MODELS_PATH/stablediffusion_assets
wget "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/UNetModel-256-256-MHA-fp16-opt.param" -P $MODELS_PATH/stablediffusion_assets
wget "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/UNetModel-512-512-MHA-fp16-opt.param" -P $MODELS_PATH/stablediffusion_assets
wget "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/UNetModel-base-MHA-fp16.param" -P $MODELS_PATH/stablediffusion_assets
wget "https://github.com/EdVince/Stable-Diffusion-NCNN/raw/main/x86/linux/assets/log_sigmas.bin" -P $MODELS_PATH/stablediffusion_assets
wget "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/vocab.txt" -P $MODELS_PATH/stablediffusion_assets
wget "https://github.com/lenaxia/stablediffusion-bins/releases/download/2023.05.24/UNetModel-MHA-fp16.bin" -P $MODELS_PATH/stablediffusion_assets
wget "https://github.com/lenaxia/stablediffusion-bins/releases/download/2023.05.24/FrozenCLIPEmbedder-fp16.bin" -P $MODELS_PATH/stablediffusion_assets
wget "https://github.com/lenaxia/stablediffusion-bins/releases/download/2023.05.24/AutoencoderKL-fp16.bin" -P $MODELS_PATH/stablediffusion_assets
wget "https://github.com/lenaxia/stablediffusion-bins/releases/download/2023.05.24/AutoencoderKL-encoder-512-512-fp16.bin" -P $MODELS_PATH/stablediffusion_assets
cat << "EOF" >> $MODELS_PATH/stablediffusion.yaml
name: stablediffusion
backend: stablediffusion
asset_dir: stablediffusion_assets
EOF
State: Terminated
Reason: Completed
Exit Code: 0
Started: Sun, 04 Jun 2023 06:51:03 +0000
Finished: Sun, 04 Jun 2023 06:51:08 +0000
Ready: True
Restart Count: 0
Environment:
URL: https://gpt4all.io/models/ggml-gpt4all-j.bin
MODELS_PATH: /models
Mounts:
/models from models (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jhrl8 (ro)
Containers:
localai:
Container ID: containerd://9537d1c709629992ea54d54be3939845d97eeb9b3ce9c226b26a57399c3a6ff0
Image: quay.io/go-skynet/local-ai:v1.17.1
Image ID: quay.io/go-skynet/local-ai@sha256:589f2d985aae9baca0813cb3282d2fe4a68d0a4dc2c7f352009f5941fa45c9ed
Port: 8080/TCP
Host Port: 0/TCP
State: Running
Started: Sun, 04 Jun 2023 06:57:34 +0000
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Sun, 04 Jun 2023 06:51:08 +0000
Finished: Sun, 04 Jun 2023 06:57:33 +0000
Ready: True
Restart Count: 1
Limits:
gpu.intel.com/i915: 1
memory: 40000Mi
Requests:
cpu: 200m
gpu.intel.com/i915: 1
memory: 2000Mi
Liveness: http-get http://:8080/healthz delay=240s timeout=1s period=30s #success=1 #failure=4
Readiness: http-get http://:8080/readyz delay=240s timeout=1s period=30s #success=1 #failure=4
Environment:
THREADS: 8
CONTEXT_SIZE: 512
MODELS_PATH: /models
IMAGE_PATH: /tmp
BUILD_TYPE: clblast
GO_TAGS: stablediffusion
DEBUG: true
Mounts:
/models from models (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-jhrl8 (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
models:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: localai-models
ReadOnly: false
kube-api-access-jhrl8:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: node-role.kubernetes.io/worker=true
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 20s
node.kubernetes.io/unreachable:NoExecute op=Exists for 20s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 16m default-scheduler Successfully assigned home/localai-7c768dbc8c-dn4g9 to k3s-worker-03
Normal Pulled 16m kubelet Container image "busybox@sha256:b5d6fe0712636ceb7430189de28819e195e8966372edfc2d9409d79402a0dc16" already present on machine
Normal Created 16m kubelet Created container download-model
Normal Started 16m kubelet Started container download-model
Normal Killing 10m kubelet Container localai failed liveness probe, will be restarted
Normal Pulled 9m36s (x2 over 16m) kubelet Container image "quay.io/go-skynet/local-ai:v1.17.1" already present on machine
Normal Created 9m36s (x2 over 16m) kubelet Created container localai
Normal Started 9m35s (x2 over 16m) kubelet Started container localai
Warning Unhealthy 4m6s (x10 over 11m) kubelet Readiness probe failed: Get "http://10.42.3.130:8080/readyz": dial tcp 10.42.3.130:8080: connect: connection refused
Warning Unhealthy 4m6s (x7 over 11m) kubelet Liveness probe failed: Get "http://10.42.3.130:8080/healthz": dial tcp 10.42.3.130:8080: connect: connection refused
Logs:
I local-ai build info:
I BUILD_TYPE: clblast
I GO_TAGS: stablediffusion
CGO_LDFLAGS="" C_INCLUDE_PATH=/build/go-llama:/build/go-stable-diffusion/:/build/gpt4all/gpt4all-bindings/golang/:/build/go-ggml-transformers:/build/go-rwkv:/build/whisper.cpp:/build/go-bert:/build/bloomz LIBRARY_PATH=/build/go-llama:/build/go-stable-diffusion/:/build/gpt4all/gpt4all-bindings/golang/:/build/go-ggml-transformers:/build/go-rwkv:/build/whisper.cpp:/build/go-bert:/build/bloomz go build -ldflags "?=" -tags "stablediffusion" -o local-ai ./
Starting LocalAI using 8 threads, with models path: /models
7:03AM DBG Model: gpt4all-j (config: {OpenAIRequest:{Model:ggml-gpt4all-j File: Language: ResponseFormat: Size: Prompt:<nil> Instruction: Input:<nil> Stop:<nil> Messages:[] Stream:false Echo:false TopP:0.7 TopK:80 Temperature:0.2 Maxtokens:0 N:0 Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 Seed:0 Mode:0 Step:0} Name:gpt4all-j StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:1024 F16:false Threads:0 Debug:false Roles:map[] Embeddings:false Backend:gpt4all-j TemplateConfig:{Completion:gpt4all-completion Chat:gpt4all-chat Edit:} MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 ImageGenerationAssets: PromptCachePath: PromptCacheAll:false PromptStrings:[] InputStrings:[] InputToken:[]})
7:03AM DBG Model: vicuna (config: {OpenAIRequest:{Model:vicuna File: Language: ResponseFormat: Size: Prompt:<nil> Instruction: Input:<nil> Stop:<nil> Messages:[] Stream:false Echo:false TopP:0.7 TopK:80 Temperature:0.2 Maxtokens:0 N:0 Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 Seed:0 Mode:0 Step:0} Name:vicuna StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:1024 F16:false Threads:0 Debug:false Roles:map[] Embeddings:false Backend:llama TemplateConfig:{Completion:vicuna-completion Chat:vicuna-chat Edit:} MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 ImageGenerationAssets: PromptCachePath: PromptCacheAll:false PromptStrings:[] InputStrings:[] InputToken:[]})
┌───────────────────────────────────────────────────┐
│ Fiber v2.46.0 │
│ http://127.0.0.1:8080 │
│ (bound on host 0.0.0.0 and port 8080) │
│ │
│ Handlers ............ 25 Processes ........... 1 │
│ Prefork ....... Disabled PID .............. 8042 │
└───────────────────────────────────────────────────┘
7:04AM DBG Request received: {"model":"vicuna","file":"","language":"","response_format":"","size":"","prompt":null,"instruction":"","input":null,"stop":null,"messages":[{"role":"user","content":"How are you?"}],"stream":false,"echo":false,"top_p":0,"top_k":0,"temperature":0.9,"max_tokens":0,"n":0,"batch":0,"f16":false,"ignore_eos":false,"repeat_penalty":0,"n_keep":0,"mirostat_eta":0,"mirostat_tau":0,"mirostat":0,"seed":0,"mode":0,"step":0}
7:04AM DBG Parameter Config: &{OpenAIRequest:{Model:vicuna File: Language: ResponseFormat: Size: Prompt:<nil> Instruction: Input:<nil> Stop:<nil> Messages:[] Stream:false Echo:false TopP:0.7 TopK:80 Temperature:0.9 Maxtokens:0 N:0 Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 Seed:0 Mode:0 Step:0} Name:vicuna StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:1024 F16:false Threads:8 Debug:true Roles:map[] Embeddings:false Backend:llama TemplateConfig:{Completion:vicuna-completion Chat:vicuna-chat Edit:} MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 ImageGenerationAssets: PromptCachePath: PromptCacheAll:false PromptStrings:[] InputStrings:[] InputToken:[]}
7:04AM DBG Template found, input modified to: Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
How are you?
### Response:
7:04AM DBG Loading model llama from vicuna
7:04AM DBG Loading model in memory from file: /models/vicuna
llama.cpp: loading model from /models/vicuna
llama_model_load_internal: format = ggjt v3 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 1024
llama_model_load_internal: n_embd = 5120
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 40
llama_model_load_internal: n_layer = 40
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 8 (mostly Q5_0)
llama_model_load_internal: n_ff = 13824
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size = 0.09 MB
llama_model_load_internal: mem required = 10583.26 MB (+ 3216.00 MB per state)
.
llama_init_from_file: kv self size = 1600.00 MB
llama_print_timings: load time = 92005.19 ms
llama_print_timings: sample time = 14.75 ms / 17 runs ( 0.87 ms per token)
llama_print_timings: prompt eval time = 16637.22 ms / 39 tokens ( 426.60 ms per token)
llama_print_timings: eval time = 8085.18 ms / 16 runs ( 505.32 ms per token)
llama_print_timings: total time = 114330.76 ms
7:06AM DBG Response: {"object":"chat.completion","model":"vicuna","choices":[{"message":{"role":"assistant","content":"I am doing well, thank you for asking. And how about you?"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
[10.42.3.8]:51328 200 - POST /v1/chat/completions
As an example, looking here: https://youtu.be/tZ8uOHNELIU?t=30, it looks like I'm missing the initialization of CLBlast, and logs reporting the offloading to the GPU.
I think that's a typo in the docs, it should be BUILD_TYPE=clblas .
Changing BUILD_TYPE
to clblas
results in an error:
env:
env:
- name: THREADS
value: 8
- name: CONTEXT_SIZE
value: 512
- name: MODELS_PATH
value: "/models"
- name: IMAGE_PATH
value: /tmp
- name: BUILD_TYPE
value: clblas
- name: GO_TAGS
value: stablediffusion
- name: DEBUG
value: "true"
Error Log:
I local-ai build info:
I BUILD_TYPE: clblas
I GO_TAGS: stablediffusion
CGO_LDFLAGS="-lOpenCL -lclblast" C_INCLUDE_PATH=/build/go-llama:/build/go-stable-diffusion/:/build/gpt4all/gpt4all-bindings/golang/:/build/go-ggml-transformers:/build/go-rwkv:/build/whisper.cpp:/build/go-bert:/build/bloomz LIBRARY_PATH=/build/go-llama:/build/go-stable-diffusion/:/build/gpt4all/gpt4all-bindings/golang/:/build/go-ggml-transformers:/build/go-rwkv:/build/whisper.cpp:/build/go-bert:/build/bloomz go build -ldflags "?=" -tags "stablediffusion" -o local-ai ./
Starting LocalAI using 8 threads, with models path: /models
┌───────────────────────────────────────────────────┐
│ Fiber v2.46.0 │
│ http://127.0.0.1:8080 │
│ (bound on host 0.0.0.0 and port 8080) │
│ │
│ Handlers ............ 25 Processes ........... 1 │
│ Prefork ....... Disabled PID .............. 8029 │
└───────────────────────────────────────────────────┘
8:41AM DBG Model: stablediffusion (config: {OpenAIRequest:{Model: File: Language: ResponseFormat: Size: Prompt:<nil> Instruction: Input:<nil> Stop:<nil> Messages:[] Stream:false Echo:false TopP:0 TopK:0 Temperature:0 Maxtokens:0 N:0 Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 Seed:0 Mode:0 Step:0} Name:stablediffusion StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:0 F16:false Threads:0 Debug:false Roles:map[] Embeddings:false Backend:stablediffusion TemplateConfig:{Completion: Chat: Edit:} MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 ImageGenerationAssets:stablediffusion_assets PromptCachePath: PromptCacheAll:false PromptStrings:[] InputStrings:[] InputToken:[]})
8:41AM DBG Model: vicuna (config: {OpenAIRequest:{Model:vicuna File: Language: ResponseFormat: Size: Prompt:<nil> Instruction: Input:<nil> Stop:<nil> Messages:[] Stream:false Echo:false TopP:0.7 TopK:80 Temperature:0.2 Maxtokens:0 N:0 Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 Seed:0 Mode:0 Step:0} Name:vicuna StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:1024 F16:false Threads:0 Debug:false Roles:map[] Embeddings:false Backend:llama TemplateConfig:{Completion:vicuna-completion Chat:vicuna-chat Edit:} MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 ImageGenerationAssets: PromptCachePath: PromptCacheAll:false PromptStrings:[] InputStrings:[] InputToken:[]})
[10.42.5.1]:49684 200 - GET /readyz
[10.42.5.1]:49670 200 - GET /healthz
[10.42.3.8]:37058 200 - GET /v1/models
8:42AM DBG Request received: {"model":"vicuna","file":"","language":"","response_format":"","size":"","prompt":null,"instruction":"","input":null,"stop":null,"messages":[{"role":"user","content":"How are you?"}],"stream":false,"echo":false,"top_p":0,"top_k":0,"temperature":0.9,"max_tokens":0,"n":0,"batch":0,"f16":false,"ignore_eos":false,"repeat_penalty":0,"n_keep":0,"mirostat_eta":0,"mirostat_tau":0,"mirostat":0,"seed":0,"mode":0,"step":0}
8:42AM DBG Parameter Config: &{OpenAIRequest:{Model:vicuna File: Language: ResponseFormat: Size: Prompt:<nil> Instruction: Input:<nil> Stop:<nil> Messages:[] Stream:false Echo:false TopP:0.7 TopK:80 Temperature:0.9 Maxtokens:0 N:0 Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 Seed:0 Mode:0 Step:0} Name:vicuna StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:1024 F16:false Threads:8 Debug:true Roles:map[] Embeddings:false Backend:llama TemplateConfig:{Completion:vicuna-completion Chat:vicuna-chat Edit:} MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 ImageGenerationAssets: PromptCachePath: PromptCacheAll:false PromptStrings:[] InputStrings:[] InputToken:[]}
8:42AM DBG Template found, input modified to: Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
How are you?
### Response:
8:42AM DBG Loading model llama from vicuna
8:42AM DBG Loading model in memory from file: /models/vicuna
llama.cpp: loading model from /models/vicuna
llama_model_load_internal: format = ggjt v3 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 1024
llama_model_load_internal: n_embd = 5120
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 40
llama_model_load_internal: n_layer = 40
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 8 (mostly Q5_0)
llama_model_load_internal: n_ff = 13824
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size = 0.09 MB
ggml_opencl: clGetPlatformIDs(NPLAT, platform_ids, &n_platforms) error -1001 at /build/go-llama/llama.cpp/ggml-opencl.cpp:344
kc describe pod:
Name: localai-5c9b777d9f-hvmhc
Namespace: home
Priority: 0
Node: k3s-worker-00/192.168.3.20
Start Time: Sun, 04 Jun 2023 08:27:12 +0000
Labels: app.kubernetes.io/instance=localai
app.kubernetes.io/name=localai
pod-template-hash=5c9b777d9f
Annotations: <none>
Status: Running
IP: 10.42.5.135
IPs:
IP: 10.42.5.135
Controlled By: ReplicaSet/localai-5c9b777d9f
Init Containers:
download-model:
Container ID: containerd://6aac26a3e30bafc894fdd2e32b727a74dc09ae93e7d0af48e16b1fa243a08d2f
Image: busybox@sha256:b5d6fe0712636ceb7430189de28819e195e8966372edfc2d9409d79402a0dc16
Image ID: docker.io/library/busybox@sha256:b5d6fe0712636ceb7430189de28819e195e8966372edfc2d9409d79402a0dc16
Port: <none>
Host Port: <none>
Command:
/bin/sh
-c
Args:
## A simpler and more secure way if you have a way of staging an archive with the files you need
#wget "https://s3.domain.tld/public/stablediffusion.tar" -P /tmp
#tar -xzvf /tmp/stablediffusion.tar -C $MODELS_PATH
#rm -rf /tmp/stablediffusion.tar
## A more general and less secure way that grab all the files from github
## Details here: https://github.com/go-skynet/LocalAI
## And here: https://github.com/lenaxia/stablediffusion-bins/releases/tag/2023.05.24
mkdir $MODELS_PATH/stablediffusion_assets
wget "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/AutoencoderKL-256-256-fp16-opt.param" -P $MODELS_PATH/stablediffusion_assets
wget "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/AutoencoderKL-512-512-fp16-opt.param" -P $MODELS_PATH/stablediffusion_assets
wget "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/AutoencoderKL-base-fp16.param" -P $MODELS_PATH/stablediffusion_assets
wget "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/FrozenCLIPEmbedder-fp16.param" -P $MODELS_PATH/stablediffusion_assets
wget "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/UNetModel-256-256-MHA-fp16-opt.param" -P $MODELS_PATH/stablediffusion_assets
wget "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/UNetModel-512-512-MHA-fp16-opt.param" -P $MODELS_PATH/stablediffusion_assets
wget "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/UNetModel-base-MHA-fp16.param" -P $MODELS_PATH/stablediffusion_assets
wget "https://github.com/EdVince/Stable-Diffusion-NCNN/raw/main/x86/linux/assets/log_sigmas.bin" -P $MODELS_PATH/stablediffusion_assets
wget "https://raw.githubusercontent.com/EdVince/Stable-Diffusion-NCNN/main/x86/linux/assets/vocab.txt" -P $MODELS_PATH/stablediffusion_assets
wget "https://github.com/lenaxia/stablediffusion-bins/releases/download/2023.05.24/UNetModel-MHA-fp16.bin" -P $MODELS_PATH/stablediffusion_assets
wget "https://github.com/lenaxia/stablediffusion-bins/releases/download/2023.05.24/FrozenCLIPEmbedder-fp16.bin" -P $MODELS_PATH/stablediffusion_assets
wget "https://github.com/lenaxia/stablediffusion-bins/releases/download/2023.05.24/AutoencoderKL-fp16.bin" -P $MODELS_PATH/stablediffusion_assets
wget "https://github.com/lenaxia/stablediffusion-bins/releases/download/2023.05.24/AutoencoderKL-encoder-512-512-fp16.bin" -P $MODELS_PATH/stablediffusion_assets
cat << "EOF" >> $MODELS_PATH/stablediffusion.yaml
name: stablediffusion
backend: stablediffusion
asset_dir: stablediffusion_assets
EOF
State: Terminated
Reason: Completed
Exit Code: 0
Started: Sun, 04 Jun 2023 08:27:28 +0000
Finished: Sun, 04 Jun 2023 08:28:35 +0000
Ready: True
Restart Count: 0
Environment:
URL: https://gpt4all.io/models/ggml-gpt4all-j.bin
MODELS_PATH: /models
Mounts:
/models from models (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-rxq5d (ro)
Containers:
localai:
Container ID: containerd://c75d96c86aa0bad4a9cd40da20ce89e46427daeb8eba8618c2a7e3599f3b065b
Image: quay.io/go-skynet/local-ai:v1.17.1
Image ID: quay.io/go-skynet/local-ai@sha256:589f2d985aae9baca0813cb3282d2fe4a68d0a4dc2c7f352009f5941fa45c9ed
Port: 8080/TCP
Host Port: 0/TCP
State: Running
Started: Sun, 04 Jun 2023 08:42:52 +0000
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Sun, 04 Jun 2023 08:37:27 +0000
Finished: Sun, 04 Jun 2023 08:42:39 +0000
Ready: False
Restart Count: 2
Limits:
gpu.intel.com/i915: 1
memory: 40000Mi
Requests:
cpu: 200m
gpu.intel.com/i915: 1
memory: 2000Mi
Liveness: http-get http://:8080/healthz delay=300s timeout=1s period=30s #success=1 #failure=4
Readiness: http-get http://:8080/readyz delay=300s timeout=1s period=30s #success=1 #failure=4
Environment:
THREADS: 8
CONTEXT_SIZE: 512
MODELS_PATH: /models
IMAGE_PATH: /tmp
BUILD_TYPE: clblas
GO_TAGS: stablediffusion
DEBUG: true
Mounts:
/models from models (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-rxq5d (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
models:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: localai-models
ReadOnly: false
kube-api-access-rxq5d:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: node-role.kubernetes.io/worker=true
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 20s
node.kubernetes.io/unreachable:NoExecute op=Exists for 20s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 17m default-scheduler 0/7 nodes are available: 7 pod has unbound immediate PersistentVolumeClaims. preemption: 0/7 nodes are available: 7 Preemption is not helpful for scheduling.
Warning FailedScheduling 17m default-scheduler 0/7 nodes are available: 7 pod has unbound immediate PersistentVolumeClaims. preemption: 0/7 nodes are available: 7 Preemption is not helpful for scheduling.
Normal Scheduled 17m default-scheduler Successfully assigned home/localai-5c9b777d9f-hvmhc to k3s-worker-00
Normal SuccessfulAttachVolume 17m attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-97dd0f23-9834-4f4b-9516-65ef13ddafc6"
Normal Pulled 17m kubelet Container image "busybox@sha256:b5d6fe0712636ceb7430189de28819e195e8966372edfc2d9409d79402a0dc16" already present on machine
Normal Created 17m kubelet Created container download-model
Normal Started 17m kubelet Started container download-model
Warning BackOff 2m15s kubelet Back-off restarting failed container
Normal Pulled 2m2s (x3 over 16m) kubelet Container image "quay.io/go-skynet/local-ai:v1.17.1" already present on machine
Normal Created 2m2s (x3 over 16m) kubelet Created container localai
Normal Started 2m2s (x3 over 16m) kubelet Started container localai
If you are running this on Kubernetes, you'd need to expose your dri device to the pod. However I haven't tested this inside kubernetes yet, so can't give you specific instructions to follow here
Whelp, I got it working, it requires the installation of opencl drivers. I'l get documentation for it written up so we can add it to the LocalAI.io docs.
I local-ai build info:
I BUILD_TYPE: clblas
I GO_TAGS: stablediffusion
CGO_LDFLAGS="-lOpenCL -lclblast" C_INCLUDE_PATH=/build/go-llama:/build/go-stable-diffusion/:/build/gpt4all/gpt4all-bindings/golang/:/build/go-ggml-transformers:/build/go-rwkv:/build/whisper.cpp:/build/go-bert:/build/bloomz LIBRARY_PATH=/build/go-llama:/build/go-stable-diffusion/:/build/gpt4all/gpt4all-bindings/golang/:/build/go-ggml-transformers:/build/go-rwkv:/build/whisper.cpp:/build/go-bert:/build/bloomz go build -ldflags "?=" -tags "stablediffusion" -o local-ai ./
Starting LocalAI using 8 threads, with models path: /models
6:01PM DBG Model: gpt4all-j (config: {OpenAIRequest:{Model:ggml-gpt4all-j File: Language: ResponseFormat: Size: Prompt:<nil> Instruction: Input:<nil> Stop:<nil> Messages:[] Stream:false Echo:false TopP:0.7 TopK:80 Temperature:0.2 Maxtokens:0 N:0 Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 Seed:0 Mode:0 Step:0} Name:gpt4all-j StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:1024 F16:false Threads:0 Debug:false Roles:map[] Embeddings:false Backend:gpt4all-j TemplateConfig:{Completion:gpt4all-completion Chat:gpt4all-chat Edit:} MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:0 ImageGenerationAssets: PromptCachePath: PromptCacheAll:false PromptStrings:[] InputStrings:[] InputToken:[]})
6:01PM DBG Model: vicuna (config: {OpenAIRequest:{Model:vicuna File: Language: ResponseFormat: Size: Prompt:<nil> Instruction: Input:<nil> Stop:<nil> Messages:[] Stream:false Echo:false TopP:0.7 TopK:80 Temperature:0.2 Maxtokens:0 N:0 Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 Seed:0 Mode:0 Step:0} Name:vicuna StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:1024 F16:false Threads:0 Debug:false Roles:map[] Embeddings:false Backend:llama TemplateConfig:{Completion:vicuna-completion Chat:vicuna-chat Edit:} MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:32 ImageGenerationAssets: PromptCachePath: PromptCacheAll:false PromptStrings:[] InputStrings:[] InputToken:[]})
┌───────────────────────────────────────────────────┐
│ Fiber v2.46.0 │
│ http://127.0.0.1:8080 │
│ (bound on host 0.0.0.0 and port 8080) │
│ │
│ Handlers ............ 25 Processes ........... 1 │
│ Prefork ....... Disabled PID .............. 8081 │
└───────────────────────────────────────────────────┘
:0,"n_keep":0,"mirostat_eta":0,"mirostat_tau":0,"mirostat":0,"seed":0,"mode":0,"step":0}
6:10PM DBG Parameter Config: &{OpenAIRequest:{Model:vicuna File: Language: ResponseFormat: Size: Prompt:<nil> Instruction: Input:<nil> Stop:<nil> Messages:[] Stream:false Echo:false TopP:0.7 TopK:80 Temperature:0.9 Maxtokens:0 N:0 Batch:0 F16:false IgnoreEOS:false RepeatPenalty:0 Keep:0 MirostatETA:0 MirostatTAU:0 Mirostat:0 Seed:0 Mode:0 Step:0} Name:vicuna StopWords:[] Cutstrings:[] TrimSpace:[] ContextSize:1024 F16:false Threads:8 Debug:true Roles:map[] Embeddings:false Backend:llama TemplateConfig:{Completion:vicuna-completion Chat:vicuna-chat Edit:} MirostatETA:0 MirostatTAU:0 Mirostat:0 NGPULayers:32 ImageGenerationAssets: PromptCachePath: PromptCacheAll:false PromptStrings:[] InputStrings:[] InputToken:[]}
6:10PM DBG Template found, input modified to: Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
How are you?
### Response:
6:10PM DBG Loading model llama from vicuna
6:10PM DBG Loading model in memory from file: /models/vicuna
llama.cpp: loading model from /models/vicuna
llama_model_load_internal: format = ggjt v3 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 1024
llama_model_load_internal: n_embd = 5120
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 40
llama_model_load_internal: n_layer = 40
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 8 (mostly Q5_0)
llama_model_load_internal: n_ff = 13824
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size = 0.09 MB
ggml_opencl: selecting platform: 'Intel(R) OpenCL HD Graphics'
ggml_opencl: selecting device: 'Intel(R) HD Graphics 530'
ggml_opencl: device FP16 support: true
llama_model_load_internal: mem required = 10583.26 MB (+ 3216.00 MB per state)
ggml_opencl: offloading 32 layers to GPU
ggml_opencl: total VRAM used: 6655 MB
.
llama_init_from_file: kv self size = 1600.00 MB
llama_print_timings: load time = 55941.34 ms
llama_print_timings: sample time = 16.78 ms / 17 runs ( 0.99 ms per token)
llama_print_timings: prompt eval time = 255453.86 ms / 39 tokens ( 6550.10 ms per token)
llama_print_timings: eval time = 28250.62 ms / 16 runs ( 1765.66 ms per token)
llama_print_timings: total time = 289040.55 ms
6:14PM DBG Response: {"object":"chat.completion","model":"vicuna","choices":[{"message":{"role":"assistant","content":"I am doing well, thank you for asking. And how about you?"}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
[10.42.3.8]:51328 200 - POST /v1/chat/completions
But seriously, this intel iGPU sucks. So much slower compared to even just the CPU.
4 threads, openblas
llama_print_timings: load time = 88639.48 ms
llama_print_timings: sample time = 12.25 ms / 17 runs ( 0.72 ms per token)
llama_print_timings: prompt eval time = 13018.65 ms / 39 tokens ( 333.81 ms per token)
llama_print_timings: eval time = 7248.51 ms / 16 runs ( 453.03 ms per token)
llama_print_timings: total time = 48560.37 ms
Hi! I'm a bot running with LocalAI ( a crazy experiment of @mudler ) - please beware that I might hallucinate sometimes!
_but.... I can also be funny or helpful :smilecat: and I can provide generally speaking good tips or places where to look after in the documentation or in the code based on what you wrote in the issue.
Don't engage in conversation with me, I don't support (yet) replying!
To provide support for OpenCL, you can build LocalAI with BUILD_TYPE=clblas. This will use CLBlast to provide OpenCL GPU acceleration. You will also need to specify a CLBlast directory using the environment variable CLBLAST_DIR. Here is an example command to build LocalAI with CLBlast support:
export BUILD_TYPE=clblas
export CLBLAST_DIR=/path/to/clblast
make BUILD_TYPE=clblas build
Note that you will need to have OpenCL installed and set up on your system before building LocalAI. Additionally, if you want to use the pre-configured models feature, you can specify a list of models in the environment variable PRELOAD_MODELS. Each model should have a URL to its YAML configuration file and a name to use for the model. For example:
export PRELOAD_MODELS=["{\"url\": \"github:go-skynet/model-gallery/gpt4all-j.yaml\", \"name\": \"gpt-3.5-turbo\"}"]
This will automatically download and load the gpt4all-j model as "gpt-3.5-turbo" when starting LocalAI.
Sources:
I think it would be good if the instructions around getting CLBLAST to work were updated.
So far on Fedora I can't get LocalAI with CLBlast to build as the make fails with ld complaining about missing libraries - despite the build taking place within the official docker container and with CLBLAST installed, mounted as a volume and with CLBLAST_DIR set.
...
CGO_LDFLAGS="-lOpenCL -lclblast" C_INCLUDE_PATH=/build/go-ggml-transformers LIBRARY_PATH=/build/go-ggml-transformers \
go build -ldflags "-X "github.com/go-skynet/LocalAI/internal.Version=v1.22.0-10-g0883d32" -X "github.com/go-skynet/LocalAI/internal.Commit=0883d324d9b29b12e8417aa20d6458a77f62aab1"" -tags "" -o backend-assets/grpc/falcon-ggml ./cmd/grpc/falcon-ggml/
# github.com/go-skynet/LocalAI/cmd/grpc/falcon-ggml
/usr/local/go/pkg/tool/linux_amd64/link: running g++ failed: exit status 1
/usr/bin/ld: cannot find -lOpenCL
/usr/bin/ld: cannot find -lclblast
collect2: error: ld returned 1 exit status
make: *** [Makefile:387: backend-assets/grpc/falcon-ggml] Error 1
env:
BUILD_TYPE=clblas
CLBLAST_DIR=/mnt/clblast
docker-compose:
localai:
image: quay.io/go-skynet/local-ai:master
...
build:
context: ${MOUNT_DOCKER_DATA}/LocalAI/git/
dockerfile: Dockerfile
ports:
- 8888:8080
env_file:
- localai/env
volumes:
- ${MOUNT_DOCKER_DATA}/LocalAI/models:/models
- ${MOUNT_DOCKER_DATA}/LocalAI/clblast:/mnt/clblast
...
clblast:
/opt/docker-data/LocalAI/clblast
ls -1
bin/
include/
lib/
ls -1 bin/
clblast_sample_cache_c*
clblast_sample_dgemv_c*
clblast_sample_dtrsm*
clblast_sample_haxpy_c*
clblast_sample_samax_c*
clblast_sample_sasum_c*
clblast_sample_sgemm*
clblast_sample_sgemm_batched*
clblast_sample_sgemm_c*
clblast_sample_tuning_api*
clblast_tuner_copy_fast*
clblast_tuner_copy_pad*
clblast_tuner_invert*
clblast_tuner_routine_xgemm*
clblast_tuner_routine_xtrsv*
clblast_tuner_transpose_fast*
clblast_tuner_transpose_pad*
clblast_tuner_xaxpy*
clblast_tuner_xconvgemm*
clblast_tuner_xdot*
clblast_tuner_xgemm*
clblast_tuner_xgemm_direct*
clblast_tuner_xgemv*
clblast_tuner_xger*
ls -1 lib/
cmake/
libclblast.so -> libclblast.so.1
libclblast.so.1 -> libclblast.so.1.6.1
libclblast.so.1.6.1
pkgconfig/
I have run into the exact same issue, but using make manually on a Ubuntu 22.04 system (no docker). I used the following steps (following https://localai.io/basics/build/index.html where applicable):
mkdir -p ~/src && cd ~/src
INSTALL_PREFIX=~/src/install
mkdir -p ${INSTALL_PREFIX}
git clone --recurse-submodules https://github.com/KhronosGroup/OpenCL-SDK.git
mkdir -p OpenCL-SDK/build
cd OpenCL-SDK/build
cmake .. -DBUILD_DOCS=OFF -DBUILD_EXAMPLES=OFF -DBUILD_TESTING=OFF -DOPENCL_SDK_BUILD_SAMPLES=OFF -DOPENCL_SDK_TEST_SAMPLES=OFF
cmake --build . --config Release -j 7
cmake --install . --prefix ${INSTALL_PREFIX}
cd ~/src
git clone https://github.com/CNugteren/CLBlast.git
mkdir -p CLBlast/build
cd CLBlast/build
cmake .. -DBUILD_SHARED_LIBS=OFF -DTUNERS=OFF -DOPENCL_ROOT=${INSTALL_PREFIX}
cmake --build . --config Release -j 7
cmake --install . --prefix ${INSTALL_PREFIX}
cd ~/src
git clone --recurse-submodules -j8 https://github.com/mudler/LocalAI.git
cd LocalAI
CMAKE_ARGS="-DLLAMA_F16C=OFF -DLLAMA_AVX512=OFF -DLLAMA_AVX2=OFF -DLLAMA_FMA=OFF" CLBLAST_DIR=${INSTALL_PREFIX}/lib/cmake/CLBlast/ make BUILD_TYPE=clblas GO_TAGS=tts build
The full log is here, but fails at the same point as @sammcj
I also tried building with the following (making sure to make clean first):
CMAKE_ARGS="-DLLAMA_F16C=OFF -DLLAMA_AVX512=OFF -DLLAMA_AVX2=OFF -DLLAMA_FMA=OFF -DLLAMA_CLBLAST=ON -DCLBLAST_DIR=${INSTALL_PREFIX}/lib/cmake/CLBlast/" CLBLAST_DIR=${INSTALL_PREFIX}/lib make BUILD_TYPE=clblas GO_TAGS=tts build
And also tried with CLBLAST_DIR=${INSTALL_PREFIX}/lib/cmake/CLBlast/
~/src/install/lib$ ls -1
cmake/
libclblast.a
libOpenCLExt.a
libOpenCL.so
libOpenCL.so.1
libOpenCL.so.1.2
libOpenCLUtils.a
libOpenCLUtilsCpp.a
pkgconfig/
I also tried removing falcon-ggml
from the makefile to see if I could work around it, but still ran into linker issues with OpenCL and clblast with the next target in the makefile:
...
CGO_LDFLAGS=" -lOpenCL -lclblast" C_INCLUDE_PATH=~/src/LocalAI/sources/go-bert LIBRARY_PATH=~/src/LocalAI/sources/go-bert \
go build -ldflags " -X "github.com/go-skynet/LocalAI/internal.Version=v2.0.0-16-g89ff123" -X "github.com/go-skynet/LocalAI/internal.Commit=89ff12309daac0a9a4f6e85b7cfc3833995d4e82"" -tags "tts" -o backend-assets/grpc/bert-embeddings ./backend/go/llm/bert/
# github.com/go-skynet/go-bert.cpp
# github.com/go-skynet/LocalAI/backend/go/llm/bert
/usr/local/go/pkg/tool/linux_amd64/link: running g++ failed: exit status 1
/usr/bin/ld: cannot find -lOpenCL
/usr/bin/ld: cannot find -lclblast
/usr/bin/ld: cannot find -lOpenCL
/usr/bin/ld: cannot find -lclblast
/usr/bin/ld: cannot find -lOpenCL
/usr/bin/ld: cannot find -lclblast
/usr/bin/ld: cannot find -lOpenCL
/usr/bin/ld: cannot find -lclblast
collect2: error: ld returned 1 exit status
make: *** [Makefile:512: backend-assets/grpc/bert-embeddings] Error 1
EDIT: I am able to build and run llama.cpp with CLBlast without issue.
Is your feature request related to a problem? Please describe. No support for OpenCL.
Describe the solution you'd like Implementation of CLBlast to provide support for OpenCL.
Describe alternatives you've considered None, as implementing something for ROCm (if it was something more dedicated) would leave away pre-RDNA graphics cards for AMD, several integrated GPU for Intel, etc.
Additional context Mostly wanted for AMD and Intel iGPUs.