llama.cpp unresponsive for 20 seconds

JasonS05 commented 1 year ago

I'm trying to use this to run Auto-GPT. As a test, before hooking it up to use Auto-GPT, I tried it with Chatbot-UI. However, gpt-llama.cpp keeps locking up with LLAMA.CPP UNRESPONSIVE FOR 20 SECS. ATTEMPTING TO RESUME GENERATION whenever the LLM finishes its response. I'm using gpt4-x-alpaca-13B-GGML which I converted to gguf with the tools in llama.cpp. Using llama.cpp alone the model works fine (albeit not the smartest). What can I do to solve this issue?

catoapowell commented 1 year ago

I spent all day with this error X_X "LLAMA.CPP UNRESPONSIVE FOR 20 SECS. ATTEMPTING TO RESUME GENERATION" is a result of the latest version of llama.cpp

The main issue is that the newest version of llama.cpp is INCOMPATIBLE with gpt-lamma.cpp. so you need to go into releases and download one from june or july. Im not sure if august works.

I'm using the one from july 14th. https://github.com/ggerganov/llama.cpp/releases?page=31

The new issue is that the older version does not support gguf files, so you will need to use a .bin file instead for your chat model.

you can get those here.

https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML/tree/main

another issue I now get, the ai sometimes chats with ITSELF, which is very odd. I do not know how to solve this issue yet, but if anyone figures it out, please let me know :)

in the meantime, I am going to try another downgrade to see if it helps.

JasonS05 commented 1 year ago

Thank you! I will try downgrading my version of llama.cpp tomorrow and see how that goes for me.

JeromeRoyer commented 10 months ago

Hello, i have the same issue, but I'm using the dolphin-2.5-mixtral-8x7b.Q5_K_M.gguf. I dont' want to use an old version because I need gguf... Has anyone another solution ?

keldenl / gpt-llama.cpp

llama.cpp unresponsive for 20 seconds #62