umbertogriffo / rag-chatbot

RAG (Retrieval-augmented generation) ChatBot that provides answers based on contextual information extracted from a collection of Markdown files.
Apache License 2.0
174 stars 35 forks source link

chatbot fails with llama 3.1 on Mac metal (M1) #8

Closed anujphadke closed 2 months ago

anujphadke commented 2 months ago

chatbot fails with "llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 292, got 291"  on Mac metal (M1)

I ran the following command streamlit run chatbot/chatbot_app.py -- --model llama-3  --max-new-tokens 1024

llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 292, got 291 llama_load_model_from_file: failed to load model [6150008832] 2024-08-17 12:06:04,537 - main - ERROR - An error occurred: Failed to load model from file: rag-chatbot/models/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf

I re-ran make setup_metal with the latest commit and it still fails.

umbertogriffo commented 2 months ago

Hi @anujphadke, as you can see here there is the same issue on the official llama_ccp_python repo.

Try to bump llama_cpp_python to 0.2.85 or to an even more updated version.

I didn't noticed because I use llama3 just on Linux machines and phi-3 on the MacBook M1 since is lightweight.

Let me know if it works.

umbertogriffo commented 2 months ago

I've tested Llama 3.1 with llama_cpp_python 0.2.85 on M1, and it works. I've also updated the repo, so you need to run a git fetch and a git pull.