It seems it forces sampling the first token before the context has finished processing or something along those lines. Not sure if it applies to the regular llama.cpp backend or just llama.cpp_HF. I never get this on koboldcpp.
Is there an existing issue for this?
[X] I have searched the existing issues
Reproduction
Seems inconsistent, couldn't find a common trigger yet, reloading sometimes fixes it. It's odd. Maybe some race condition or something where it samples before the model is ready?
This issue has been closed due to inactivity for 2 months. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.
Describe the bug
It seems it forces sampling the first token before the context has finished processing or something along those lines. Not sure if it applies to the regular llama.cpp backend or just llama.cpp_HF. I never get this on koboldcpp.
Is there an existing issue for this?
Reproduction
Seems inconsistent, couldn't find a common trigger yet, reloading sometimes fixes it. It's odd. Maybe some race condition or something where it samples before the model is ready?
Screenshot
Logs
System Info