add llamafile 🦙📁 - Githubissues

llamafile is a local app (similar to llama.cpp) to run llms in a distributed way from a single file

library can be used on both .gguf and .llamafile files

repo : https://github.com/Mozilla-Ocho/llamafile

snippets

linux and mac

wget https://huggingface.co/Mozilla/Meta-Llama-3.1-8B-llamafile/resolve/main/Meta-Llama-3.1-8B.Q6_K.llamafile
chmod +x Meta-Llama-3.1-8B.Q6_K.llamafile
./Meta-Llama-3.1-8B.Q6_K.llamafile -p 'four score and seven'

windows (download and rename it using .exe)

curl -o Meta-Llama-3.1-8B.Q6_K.exe https://huggingface.co/Mozilla/Meta-Llama-3.1-8B-llamafile/resolve/main/Meta-Llama-3.1-8B.Q6_K.llamafile
.\Meta-Llama-3.1-8B.Q6_K.exe -p 'four score and seven'

gguf

wget https://github.com/Mozilla-Ocho/llamafile/releases/download/0.8.13/llamafile-0.8.13
wget https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q6_K.gguf
chmod +x llamafile-0.8.13
./llamafile-0.8.13 -m tinyllama-1.1b-chat-v1.0.Q6_K.gguf -p 'four score and'

notes

The Windows example you might want to change to TinyLLaMA though in case that 8B model exceeds the Windows 4GB .exe file size limit. It's also possible to say .\llamafile-0.8.13 -m foo.llamafile to get around the limit (similar to GGUF snippet)
they do multi-model too with e.g. llava image processing. https://huggingface.co/Mozilla/llava-v1.5-7b-llamafile That's their flagship model. If you just say ./llava-v1.5-7b-q4.llamafile it'll launch an HTTP server, open a tab in your desktop's browser, and you can chat with the model, upload an image file, ask it to analyze what it sees, etc.
binary has a different name for each library release

huggingface / huggingface.js

add llamafile 🦙📁 #871

snippets

notes