Closed dgdguk closed 7 months ago
Additionally, looking into the llama_cpp
project suggests that there are quite a few reports of crashes on different hardware with similar behaviour on various models, e.g. https://github.com/abetlen/llama-cpp-python/issues/1326 or https://github.com/abetlen/llama-cpp-python/issues/1319.
My guess is that 0.2.59 might need some more time to bake, and perhaps reverting to 0.2.56 is generally a good idea.
Also on Windows 10 with Nvidia RTX 3050Ti Laptop (modified to 8GB VRAM), with Yi or Qwen model loaded, crashed without any error messages, only "Press any key to continue..." when evaluating prompt. Downgrading llama-cpp-python to 0.2.56 fixed it.
More untested breakages, yay. How do we, "Downgrad[e] llama-cpp-python and llama-cpp-python-cuda to 0.2.56"?
How do we, "Downgrad[e] llama-cpp-python and llama-cpp-python-cuda to 0.2.56"?
I modified my requirements.txt. delete the install folder and install again. I am bringing my own venv so I just need to re pip install -r requirements.txt
Edit: make sure to run webui directly with python server.py
afterwards
More untested breakages, yay. How do we, "Downgrad[e] llama-cpp-python and llama-cpp-python-cuda to 0.2.56"?
This is tricky to describe - the manual commands vary depending on GPU, OS etc. Further, one_click.py
rolls all the update functionality into one go, so you can't say roll back a git commit and then update requirements (note: this functionality would be really useful for testing/debugging!).
I think the easiest way to do it would be to use git
to roll back to Commit 308452b, activate the Conda environment and run python -c "import one_click; one_click.update_requirements(pull=False)"
, which should update based on the old requirements file.
@oobabooga have just updated llama-cpp-python's version to 0.2.60. Anyone tried if it fix this issue?
@oobabooga have just updated llama-cpp-python's version to 0.2.60. Anyone tried if it fix this issue?
Just tested, the answer is no.
Still crashes for me instantly when entering prompt.
Running all layers on Nvidia GPU, tensorcores on, Win 11.
edit: same result if loading on just CPU.
Thanks for testing - I've added the extra info to the bug report.
My guess is that there's some kind of memory safety issue in the CPU component of llama_cpp_python
which is causing the problem, probably https://github.com/abetlen/llama-cpp-python/issues/1326. If I can, I'll see if I can bisect that code to narrow it down.
@oobabooga I would strongly recommend downgrading the requirements for llama_cpp_python
/llama_cpp_python_cuda
to 0.2.56, as right now the llama.cpp backend is broken for new installers / updates.
Same problem for me.
Same problem for me.
the issue dgdguk linked above points to a temporary solution, in the 'Model' tab, tick logits_all
when you load a model with llamacpp and llamacpp_HF, despite the warning about making prompts eval slower, for now it fixes the crash and (in my case, ymmv) doesn't seem slower than usual.
As of #5823, llama_cpp_python
/llama_cpp_python_cuda
have been downgraded to 0.2.56, which appears to have fixed the issue. If you're still having problems, upgrade to the latest version and it should take care of the requirements (at worst, delete the Conda environment and reinstall).
I do think that this issue does highlight that the project needs a sane way of reverting to an older version though. I know that development moves quickly, but from any software development point of view, making the only version available through the update the current bleeding edge is somewhat crazy.
In any case, as this is now fixed, I'm closing the issue.
I do think that this issue does highlight that the project needs a sane way of reverting to an older version though. I know that development moves quickly, but from any software development point of view, making the only version available through the update the current bleeding edge is somewhat crazy.
Agreed, It should be relatively simple to adapt the update_xx.(sh/bat) and start_xx.(sh/bat) to check the requirements on each start (akin to most stable-diffusion webui) instead of letting the updater scripts fetch the branch's HEAD before it updates requirements. It would fix two issues at once : make it easy to revert/check a specific commits and handle requirements change instead of booting in an unknown env some changes breaks.
Describe the bug
Since the update to
llama_cpp_python
/llama_cpp_python_cuda
0.2.59, there's a segfault on at least an AMD RX7900XT when using models loaded with the llama.cpp loader. This seems to occur once the prompt is sufficiently long, with small models getting maybe a hundred tokens worth of dialogue, and larger models falling over almost immediately.Downgrading
llama-cpp-python
andllama-cpp-python-cuda
to 0.2.56 fixes the issue, so a quick fix for anyone affected is to do that.According to subsequent reports by @Touch-Night and @jepjoo, 0.2.60 does not fix this issue. Further, while the original report only listed AMD GPU as the problem, is seems that both the nVidia GPU and CPU platforms are also impacted. The precise problem seems to be underlying issues in
llama_cpp_python
which are causing a crash on certain LLM architectures - Mistral models crash almost instantly (context length >= 2), Phi models take tens of tokens before crashing.Reproduction
System Info
Subsequent reports also illustrate crashes on nVidia and CPU.