mudler / LocalAI

:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. Features: Generate Text, Audio, Video, Images, Voice Cloning, Distributed inference
https://localai.io
MIT License
22.72k stars 1.73k forks source link

Docker support for Apple silicon #1659

Open szegedim opened 6 months ago

szegedim commented 6 months ago

Is your feature request related to a problem? Please describe.

Thank you for putting this together. It helped me a lot to learn the big picture of LLMs.

I tried to build and run it on an Apple silicon and I ran into some issues.

Describe the solution you'd like

I managed to fix it using the Ubuntu 22.04 docker image instead of the earlier and newer ones. Here is the Dockerfile that worked for me for an Ubuntu container on aamd64 architecture.

FROM ubuntu:22.04
#sha256:e6173d4dc55e76b87c4af8db8821b1feae4146dd47341e4d431118c7dd060a74

RUN apt update

RUN apt install -y cmake golang-1.21-go protobuf-compiler-grpc protobuf-compiler libgrpc++-dev wget curl patch git

RUN git clone https://github.com/go-skynet/LocalAI.git /opt/localai

WORKDIR /opt/localai

ENV BUILD_GRPC_FOR_BACKEND_LLAMA=true
ENV PATH=$PATH:/usr/lib/go-1.21/bin

RUN make build

RUN wget https://gpt4all.io/models/ggml-gpt4all-j.bin -O models/ggml-gpt4all-j

RUN cp -rf prompt-templates/ggml-gpt4all-j.tmpl models/

CMD echo ./local-ai --models-path=./models/ --debug=true

Describe alternatives you've considered

I was considering using brew, but I was concerned running gigabytes of new code from the internet that requires privileged access. I tried to run it in Docker to assess the performance - 10s per subsequent answers on Mac Studio. I need to do more work to assess whether it can leverage all the AI and GPU support of the silicon.

Additional context

I can open a pull request if you are interested.

Regards, Miklos

jonmach commented 6 months ago

I just tried to build (Mac Silicon - M2) using your Dockerfile and it fails in step 6/8 on the following line:

`g++ -I. -I./ggml.cpp/include -I./ggml.cpp/include/ggml/ -I./ggml.cpp/examples/ -O3 -DNDEBUG -std=c++17 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native starcoder.cpp -o starcoder.o -c

155.3 In file included from starcoder.cpp:19: 155.3 ggml.cpp/examples/starcoder/main.cpp: In function 'bool starcoder_model_load(const string&, starcoder_model&, gpt_vocab&)': 155.3 ggml.cpp/examples/starcoder/main.cpp:148:34: warning: loop variable 'token' of type 'const string&' {aka 'const std::__cxx11::basic_string&'} binds to a temporary constructed from type 'const char const' [-Wrange-loop-construct] 155.3 148 | for (const std::string & token : { 155.3 | ^~~~~ 155.3 ggml.cpp/examples/starcoder/main.cpp:148:34: note: use non-reference type 'const string' {aka 'const std::__cxx11::basic_string'} to make the copy explicit or 'const char const&' to prevent copying 155.3 In file included from starcoder.cpp:19: 155.3 ggml.cpp/examples/starcoder/main.cpp: In function 'int main_starcoder(int, char**)': 155.3 ggml.cpp/examples/starcoder/main.cpp:799:23: warning: comparison of integer expressions of different signedness: 'int' and 'std::vector::size_type' {aka 'long unsigned int'} [-Wsign-compare] 155.3 799 | for (int i = 0; i < embd_inp.size(); i++) { 155.3 | ^~~~~~~

`

Is there a specific build of Docker you're using?

jonmach commented 6 months ago

To be honest, I only need this to be an embedding server. I'll be using LM Studio as my LLM server.

Would you have a recommendation on what the Dockerfile would look like to support this cut-down build?

TheDarkTrumpet commented 6 months ago

I'd check if docker supports metal interface passing to containers. I did a bit of searching, and from what I can tell, it does not.

I run an M2 mac, and have some LLM stuff running on here, but I haven't attempted to do it through Docker. Without Metal, the inference is incredibly slow (it's running in CPU mode). A standard build of the project (without using Docker) shouldn't be too bad from a dependencies standpoint, but interfaces with either llama.cpp or transformers will likely need some extra care.

When building for llama.cpp, you need to use make LLAMA_METAL=1 to get it to allow usage of the GPU.

I haven't run localai on my mac, but I have spent time working with text-gen-webui (which also supports embeddings) to get it working on my mac. If you go that route, you can set a lot of the flags through pip, but before i go about writing those, let me know if you plan to go down that route and I can provide most of the relevant pip commands to get it working on the mac.

jonmach commented 6 months ago

Thanks for coming back on this - In the end, I gave up on LocalAI. It was just too bulky (34GB) for the small area of functionality I need, and a little too temperamental to build. I found an excellent Embeddings Server - small enough, and very fast. Runs locally, in Docker if needed, and on many models:

https://github.com/huggingface/text-embeddings-inference

TheDarkTrumpet commented 6 months ago

nods I understand. Getting responses here can take awhile and glad you found something that works. Embeddings is a fairly fast process, so even if you're stuck with using the CPU (vs metal), it may not matter as much for your purposes unless you scale out more (creating thousands of embeddings). I'm still pretty confident that docker on the mac doesn't support metal passthrough.

That said, I was curious how hard it'd be to get to localai to build on the mac. I figure I can use it anyways. I had to install one extra package (grpc from homebrew), but the rest came out well. The footprint is 3Gb before models, and transformer models for embeddings are quite small.

jonmach commented 6 months ago

I think you're correct that Docker doesn't support metal. For my purposes (dev/test etc.) I needed to be able to fail fast. CPU was good enough since it was going to be functional testing versus performance.

A 3Gb delta seems good for what you're getting with LocalAI. Full LLM and Embedding support.

I found it very hard to build LocalAI, so I downloaded the official LocalAI containers. However, even though I uploaded multiple Embedding Models into the right directories, found it difficult to get working. On balance, the return on time investment just didn't make sense.

LM Studio is tiny and very very stable in terms of serving LLMs. It also supports Metal. It doesn't however serve embeddings, but this is now not an issue.

qdrddr commented 2 months ago

Having LocalAI on my macOS/arm64 with the metal framework in a container would simplify life greatly...

sozercan commented 2 months ago

Docker engine doesn't run on macOS natively, but runs inside a linux/arm64 vm with virtualization in Apple Silicon (similar with linux/amd64 on intel)

If you want native metal acceleration, you'll need to run the native binary.

qdrddr commented 2 months ago

Docker engine doesn't run on macOS natively, but runs inside a linux/arm64 vm with virtualization in Apple Silicon (similar with linux/amd64 on intel)

If you want native metal acceleration, you'll need to run the native binary.

The problem with grinning LocalAI binary on macOS is its constantly misaligned libraries required by localai with the library available on mac, hope that part can be improved in localai. Taking into account that brew can update libraries. So practically I cannot run LocalAI on my mac either in the Docker and Binary doesn't work most of the time.

@jonmach

TheDarkTrumpet commented 1 month ago

Docker engine doesn't run on macOS natively, but runs inside a linux/arm64 vm with virtualization in Apple Silicon (similar with linux/amd64 on intel) If you want native metal acceleration, you'll need to run the native binary.

The problem with grinning LocalAI binary on macOS is its constantly misaligned libraries required by localai with the library available on mac, hope that part can be improved in localai. Taking into account that brew can update libraries. So practically I cannot run LocalAI on my mac either in the Docker and Binary doesn't work most of the time.

@jonmach

I'd consider just building from source. I use LocalAI on my mac a fair amount, and I haven't had any issues after building.