Open DeadEnded opened 8 months ago
Hello, I have the same bug when using Mistral or Mixtral for text generation. It keeps repeating the last sentance over and over till I restart the container. I tried increasing the repeat penalty but it does nothing.
I've noticed this for most, if not all models I can test. This bug essentially makes serge useless. Update Reverting to "ghcr.io/serge-chat/serge:0.8.2" appears to vastly improve or eliminate the repeating issue altogether. Still testing.
This is probably a bug in llama-cpp-python. I will update it this week and do a new release.
Which specific model are you all using? @SolutionsKrezus @fishscene
I'm currently using Mistral 7B and Mixtral @gaby I reverted to 0.8.0 and it works like a charm
This is probably a bug in llama-cpp-python. I will update it this week and do a new release.
Which specific model are you all using? @SolutionsKrezus @fishscene
Apologies, Iām at work at the moment. All models I tested were affected to some degree. Some more than others.
Off the top of my head: All current mixtral models, at least 2 mistral models, neural chat, one of the medical ones, definitely a few more as well. I did not test anything above 13b as those are beyond my hardware.
I would see random replies marked/flagged as code snippetsā¦ and if the model started repeating itself, that was the end of anything useful as all subsequent replies would only repeat.
Of all the testing I did, getting 10 coherent replies was a major milestone- and even then, sometimes it took multiple re-prompting (delete my query and ask it slightly differently) to get to 10. A couple models started spewing nonsense and repeats on the very first response.
All this to say, testing should be very easy to do. When I reverted to previous serge release, I immediately saw improvement.
Curious though. OP is using Ryzen- so am I: Ryzen 1700x, 32GB RAM, no CUDA GPU. (NVIDIA T400 I think). Using CPU for AI.
Maybe this is isolated to Ryzen CPUās?
Another behavior to note: When asking some censored models a question, they straight up have no reply at all. No detectable CPU was used either. It was like some pre-AI function was like ānopeā and didnāt pass along my query to the AI model itself. Thereās a name for this pre-process, but it escapes me at the moment. Not sure if it is a clue either.
I don't think it is a Ryzen-related issue @fishscene I have the same problem with a Intel Xeon D-1540 with 32GB RAM and no GPU.
Same issue here. This pretty much renders the software completely useless :(
Can you try ghcr.io/serge-chat/serge:main
. Thanks
Bug description
I just pulled the image, spun up a container with default settings. I downloaded the Mistral-7B model, and left everything default. I've tried a few short questions, and the answer repeats the last line until I stop the container.
Steps to reproduce
1) Spin up new container with default settings (from repo) 2) Download Mistral-7B 3) Start a new chat and ask "what is the square root of nine"
Environment Information
Docker version: 25.0.3 OS: Ubuntu 22.04.4 LTS on kernel 5.15.0-97 CPU: AMD Ryzen 5 2400G Broswer: Firefox version 123.0
Screenshots
Relevant log output
Confirmations