oobabooga / text-generation-webui

A Gradio web UI for Large Language Models.
GNU Affero General Public License v3.0
39.68k stars 5.21k forks source link

Strange llama.cpp_HF bug where the first token is chosen randomly #5186

Closed kalomaze closed 6 months ago

kalomaze commented 8 months ago

Describe the bug

It seems it forces sampling the first token before the context has finished processing or something along those lines. Not sure if it applies to the regular llama.cpp backend or just llama.cpp_HF. I never get this on koboldcpp.

Is there an existing issue for this?

Reproduction

Seems inconsistent, couldn't find a common trigger yet, reloading sometimes fixes it. It's odd. Maybe some race condition or something where it samples before the model is ready?

Screenshot

image image image image

Logs

N/A

System Info

N/A
github-actions[bot] commented 6 months ago

This issue has been closed due to inactivity for 2 months. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.