oobabooga / text-generation-webui

A Gradio web UI for Large Language Models.
GNU Affero General Public License v3.0
40.06k stars 5.26k forks source link

my webui can only output “□□□□□□□□□□□□□□□□□□□□□□□□” #4846

Closed hqqttjiang closed 6 months ago

hqqttjiang commented 10 months ago

Describe the bug

13600KF+32gb+4080 16GB runs a SMALL 7B model (has enough GPU RAM) it just only output "□□□□□□□□",change a lot model,but still the same

Is there an existing issue for this?

Reproduction

1

Screenshot

chrome_JyJOJhavJF chrome_q6INvJqXWq WindowsTerminal_dC6czpMeE4

Logs

"internal": [
        [
          "<|BEGIN-VISIBLE-CHAT|>",
            "Chiharu strides into the room with a smile, her eyes lighting up when she sees you. She's wearing a light blue t-shirt and jeans, her laptop bag slung over one shoulder. She takes a seat next to you, her enthusiasm palpable in the air\nHey! I'm so excited to finally meet you. I've heard so many great things about you and I'm eager to pick your brain about computers. I'm sure you have a wealth of knowledge that I can learn from. She grins, eyes twinkling with excitement Let's get started!"
        ],
        [
            "hi",
            "\u000f\u0001\u0007\u2585\u2585\u000f\b\u0002\r\u0007\u0000\u000f\u0006\u0005\u0006\u000e\u0000\u2585\u0002\u0005\u0010\u0006\u0005\u0002\n\b\u0010\u0000\f\n\r\t\u0002\u0000\u0010\u0001\t\u000e\u0007\u000e\f\t\u0010\f\u0000\u000f\u0010\u0005\u0000\u2585\u0010\u2585\u0005\r\u0000\u0010\u0002\n\u0004\u0006\u0010\u0001\u0010\u0005\n\u0006\u0010\u0007\u000e\u0001\u2585\u0000\u000f\t\t\u0004\n\u0001\u2585\u0007\t\u2585\u0000\u0006\u0006\u0000\u0010\t\u0007\u000e\u0010\u0002\n\u0004\n\u0005\f\u000e\u2585\t\t\u0006\u0005\t\r\u0007\u0004\t\n\u0010\u0004\u0007\u0005\b\u2585\u0001\u0001\r\u000e\u0001\u2585\u000e\t\u0004\n\r\u0000\u0005\u0007\u000f\u0006\u000f\u0004\u0010\u0005\u0005\t\n\r\r\u0004\u000f\u0010\u0002\f\u0002\u0005\u0004\b\u000f\u2585\u0000\r\u0002\u0010\u0000\r\u0000\u0005\u000f\u000f\u0007\u0000\r\u0004\r\f\t\u0006\u000e\f\u0004\u0004\r\u2585\u0005\u2585\u0004\u2585\f\f\u0006\u0006"

System Info

13600KF+32gb+4080 16GB ON windows 11
TheLounger commented 10 months ago

Your webui looks outdated (6 weeks+), try updating that first.

hqqttjiang commented 10 months ago

Your webui looks outdated (6 weeks+), try updating that first.

already update the newest,still the same

psliva commented 10 months ago

I am seeing very similar issues, and actually performed a full OS format because I assumed I corrupted drivers or packages, yet it continues after a fresh and careful reinstall :(. System stats:

  1. Ubuntu 22.04
  2. 64 GB RAM (AMD)
  3. 2x 3090 RTX FTW3

Example outputs (in my case, it's just one character repeating):

[
            "Hi, who are you?",
            "runner\u000f\u000f\u000f\u000f\u000f\u000f\u000f\u000f\u000f\u000f\u000f\u000f\u000f\u000f\u000f
[
            "Hi.",
            "Hello############################################################################################################################################################################

(The "#" are actually rendered as music notes, unprintable characters, or otherwise completely unexpected characters sometimes as well)

I investigated only a bit, my findings:

  1. Only happens with GGUF files
  2. Only happens with at least 1 layer offloaded to NVidia GPU (100% CPU does not reproduce the issue with everything else being the same)
  3. Git hash ad00b8eb does NOT reproduce the issue for me, just happened to be the build I ran for the longest time. About a month ago I tried to upgrade ooba and ran into this exactly problem, but thought it was my environment at the time, so just reverted to the September build.
  4. Seems to happen on all GGUF files regardless of quantization, I tested: a. TheBloke/deepseek-coder-33B-instruct-GGUF | deepseek-coder-33b-instruct.Q4_K_M.gguf b. TheBloke/deepseek-coder-33B-instruct-GGUF | deepseek-coder-33b-instruct.Q8_0.gguf c. TheBloke/(Samantha1.11-70b) | Q4_K_M.gguf d. TheBloke / phind-codellama-34b-v2.Q5_K_M.gguf

My solution: switch to GTPQ (works just fine) or AWQ (haven't tested but assume it will work). If I had more time, I'd try quantizing myself, debugging, or reverting to older llama.cpp packages.

SergeiKarulin commented 8 months ago

The same to me. With various llama models.

github-actions[bot] commented 6 months ago

This issue has been closed due to inactivity for 2 months. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.