LlamaLib implements an API for the llama.cpp server. The focus of this project is to:
Each release contains:
The following architectures are provided:
*-noavx
(Windows/Linux): support for CPUs without AVX instructions (operates on all AVX as well)*-avx
(Windows/Linux): support for CPUs with AVX instructions*-avx2
(Windows/Linux): support for CPUs with AVX-2 instructions*-avx512
(Windows/Linux): support for CPUs with AVX-512 instructions*-cuda-cu11.7.1
(Windows/Linux): support for Nvidia GPUs with CUDA 11 (CUDA doesn't need to be separately installed)*-cuda-cu12.2.0
(Windows/Linux): support for Nvidia GPUs with CUDA 11 (CUDA doesn't need to be separately installed)*-hip
(Windows/Linux): support for AMD GPUs with AMD HIP (HIP doesn't need to be separately installed)*-vulkan
(Windows/Linux): support for most GPUs independent of manufacturermacos-*-acc
(macOS arm64/x64): support for macOS with the Accelerate frameworkmacos-*-no_acc
(macOS arm64/x64): support for macOS without the Accelerate frameworkIn addition the windows-archchecker and linux-archchecker libraries are used to determine the presence and type of AVX instructions in Windows and Linux.
The server CLI startup guide can be accessed by running the command .\undreamai_server -h
on Linux/macOS or undreamai_server.exe -h
on Windows for the architecture of interest.
More information on the different options can be found on the llama.cpp server Readme.
The server binaries can be used to deploy remote servers for LLMUnity.
You can print the required command within Unity by running the scene.
More information can be found at the Use a remote server
section of the LLMUnity Readme.