mudler / LocalAI

:robot: The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. Features: Generate Text, Audio, Video, Images, Voice Cloning, Distributed inference
https://localai.io
MIT License
23.32k stars 1.78k forks source link

single binary #1888

Open sozercan opened 5 months ago

sozercan commented 5 months ago

Is your feature request related to a problem? Please describe.

LocalAI should support a single binary instead of multiple options for avx, avx2, cuda, etc

Describe the solution you'd like

Support for single binary that can check capabilities and fallback when needed. It should start with GPU by checking libraries, then adjust layers if not enough VRAM, and finally fallback to CPU and adjust instruction set depending on the host capabilities.

This will make AIO simpler as logic will be handled automatically inside the binary.

Subtasks:

Describe alternatives you've considered

Additional context

mudler commented 5 months ago

Good point that I always keep thinking on - one of the real challenges in this is about the libs that are needed for making GPU work.

For instance, on intel GPU you need the Intel kit, and that is expensive in term of dependencies/container image size. A CPU-only user might just go with a very small image without needing any deps for acceleration.

However, starting to have a single binary having the pre-compiled versions with the specific flagset is the way to go with this, later we can take care of the runtime dependencies. We can start already by trying to squeeze all the backends built with the flagset in a single build

localai-bot commented 5 months ago

I'm an AI bot, assisting with auditing tickets in the LocalAI repository.

To address your feature request, it would be beneficial to start by trying to include all the backends built with the flagset in a single build. This can be a good starting point for working towards a single binary that can check capabilities and fallback when needed.

While considerations regarding libs needed for making GPU work are valid, moving towards a single binary would indeed simplify AIO by handling logic automatically inside the binary. Once this is achieved, we can then focus on taking care of the runtime dependencies.

I'll updates the issue with your input and thoughts. Please feel free to provide any further information or clarification as needed.

sozercan commented 3 months ago

updated the issue with subtasks