Open hacker-DOM opened 1 year ago
Since the blog post was written (Aug 2023), llama.cpp switched to a new file format (GGUF instead of GGML). GGML is no longer supported by llama.cpp, so that might be the issue. Try it with the equivalent GGUF file: https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF/resolve/main/llama-2-7b-chat.Q8_0.gguf.
Is there no way to switch to a compatible llama.cpp
?
I'm on a metered and slow connection and it already took 2 tries (because llm
doesn't yet support resume which HF does provide) to receive the file...
If you already have the ggml file, you should be able convert it using the convert-llama-ggml-to-gguf.py script. You also don't need to download the model using llm for it to work with llm, as you can just put the file in your models folder and edit the models.json to include it. Going back to a "compatible" version might introduce a bunch of issues with dependencies.
I downloaded a GGUF model, but it still gives the Error:
message without any detailed error message.
llm llama-cpp download-model \
https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q8_0.gguf --alias llama2-chat --alias l2c --llama2-chat
After installing, I tried to run the below two commands, but both returned the same Error:
llm prompt -m l2c 'Tell me a joke about a llama'
llm -m l2c 'Tell me a joke about a llama'
Could you run this command and paste in the output?
llm plugins
You should have 0.2b1
of llm-llama-cpp
- otherwise you'll need to upgrade it with:
llm install -U llm-llama-cpp
You may also have better success with https://github.com/simonw/llm-gpt4all - the most recent release of that has been giving me really good results.
I was seeing the same "Error:" issue with ggml model. After downloading and using the gguf model, I am seeing:
/Users/Sanjay/Documents/workspaces/AI/llama-2-mac-simonwillison> llm -m l2c 'Tell me a joke about a llama' Abort /Users/Sanjay/Documents/workspaces/AI/llama-2-mac-simonwillison>
Additional notes:
Thread 6 Crashed:: Dispatch queue: ggml-metal 0 libsystem_kernel.dylib 0x7ff8138ebffe __pthread_kill + 10 1 libsystem_pthread.dylib 0x7ff8139221ff pthread_kill + 263 2 libsystem_c.dylib 0x7ff81386dd24 abort + 123 3 libllama.dylib 0x10facafe8 0x10f9d2000 + 1019880 4 libllama.dylib 0x10fa5ac14 0x10f9d2000 + 560148 5 libdispatch.dylib 0x7ff81376b34a _dispatch_client_callout2 + 8 6 libdispatch.dylib 0x7ff81377cb8b _dispatch_apply_redirect_invoke + 418 7 libdispatch.dylib 0x7ff81376b317 _dispatch_client_callout + 8 8 libdispatch.dylib 0x7ff81377ac0c _dispatch_root_queue_drain + 673 9 libdispatch.dylib 0x7ff81377b25c _dispatch_worker_thread2 + 160 10 libsystem_pthread.dylib 0x7ff81391ef8a _pthread_wqthread + 256 11 libsystem_pthread.dylib 0x7ff81391df57 start_wqthread + 15
Hi! I have followed every step in Run Llama 2 on your own Mac using LLM and Homebrew, in particular:
Next, I ran
Any ideas how I can debug this issue?
Up: I am on MacOs Sonoma (14.0), M1 arch.