Open JasonS05 opened 1 year ago
I spent all day with this error X_X "LLAMA.CPP UNRESPONSIVE FOR 20 SECS. ATTEMPTING TO RESUME GENERATION" is a result of the latest version of llama.cpp
The main issue is that the newest version of llama.cpp is INCOMPATIBLE with gpt-lamma.cpp. so you need to go into releases and download one from june or july. Im not sure if august works.
I'm using the one from july 14th. https://github.com/ggerganov/llama.cpp/releases?page=31
The new issue is that the older version does not support gguf files, so you will need to use a .bin file instead for your chat model.
you can get those here.
https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML/tree/main
another issue I now get, the ai sometimes chats with ITSELF, which is very odd. I do not know how to solve this issue yet, but if anyone figures it out, please let me know :)
in the meantime, I am going to try another downgrade to see if it helps.
Thank you! I will try downgrading my version of llama.cpp tomorrow and see how that goes for me.
Hello, i have the same issue, but I'm using the dolphin-2.5-mixtral-8x7b.Q5_K_M.gguf. I dont' want to use an old version because I need gguf... Has anyone another solution ?
I'm trying to use this to run Auto-GPT. As a test, before hooking it up to use Auto-GPT, I tried it with Chatbot-UI. However, gpt-llama.cpp keeps locking up with
LLAMA.CPP UNRESPONSIVE FOR 20 SECS. ATTEMPTING TO RESUME GENERATION
whenever the LLM finishes its response. I'm using gpt4-x-alpaca-13B-GGML which I converted to gguf with the tools in llama.cpp. Using llama.cpp alone the model works fine (albeit not the smartest). What can I do to solve this issue?