Closed yesbroc closed 1 year ago
(fixed for instruct, chat mode still broke)
Have you solved this? I am also seeing the exact same issue. Always 0 tokens no matter the prompt for a llama.cpp quantized model.
Output generated in 0.24 seconds (0.00 tokens/s, 0 tokens, context 66, seed 1942958422)
If I run with --no-stream I see this error whenever I try to submit chat/prompt. If I run normally I get the generated 0 tokens output but no error.
Traceback (most recent call last):
File "/home/mwhitford/src/text-generation-webui/modules/text_generation.py", line 308, in generate_reply_custom
reply = shared.model.generate(context=question, **generate_params)
File "/home/mwhitford/src/text-generation-webui/modules/llamacpp_model.py", line 77, in generate
for completion_chunk in completion_chunks:
File "/home/mwhitford/miniconda3/envs/textgen/lib/python3.10/site-packages/llama_cpp/llama.py", line 647, in _create_completion
raise ValueError(
ValueError: Requested tokens exceed context window of 2048
Output generated in 0.01 seconds (0.00 tokens/s, 0 tokens, context 68, seed 964245376)
@michaelwhitford same here, with no stream. However, if the stream is on, I sometimes get other errors and the program stops.
why is webui so broken 😭
i have a temporary solution for me, setting tokens to 200 (default) works, havent tried long conersations yet. continue doesnt work still. long prompts seem fine, couldnt be bothered doing even longer ones bc its llamacpp (very slow)
I can confirm I lowered max tokens from 2000 to 1800 and it's working for llama.cpp models again.
still, continue doesnt work. also a shame we dont get the 2000 tokens :c
ooba fr isnt gonna fix this lol
See also llama-cpp-python issue #307 for a quantification of this problem.
do the following if you're still having this problem: update to llama.cpp version 0.1.61 or higher pull the most recent update of text-generation-webui
if none of these work, reinstall from scratch (what i did)
Describe the bug
kinda like bing deleting its messages, ai deletes its own message with 0 tokens generated. much like #2204
Is there an existing issue for this?
Reproduction
update to llama.cpp 0.1.53 load a ggml v3 model type a message message redacted
Screenshot
Logs
System Info