Open gmonair opened 9 months ago
I'm running into the same error using LeoLM/leo-hessianai-13b-chat - not using llama.cpp though.
AssertionError: We can't consume any more tokens, but we are not yet done! Perhaps your model's token set is incomplete?
Similar overall behaviour, some (simple) settings work, others fail. With the base Llama I don't get this error.
At first I thought it was due to the mismatch of tokenizer vocab size and model vocab size.
I did not get to test in thoroughly, but it seems that certain input lengths or characters seem to trigger the error.
I have similar problem, in my case, I simply can't use stop
or stop_regex
, so even the "simple" example of
lm = llama2 + 'Problem: Luke has a hundred and six balls. He then loses thirty six.\n'
lm += 'Equivalent arithmetic expression: ' + gen(stop='\n') + '\n'
give the error
AssertionError: We can't consume any more tokens, but we are not yet done! Perhaps your model's token set is incomplete?
While just using max_tokens
works
lm = llama2 + 'Problem: Luke has a hundred and six balls. He then loses thirty six.\n'
lm += 'Equivalent arithmetic expression: ' + gen(max_tokens=15) + '\n'
returns normal output
Problem: Luke has a hundred and six balls. He then loses thirty six. Equivalent arithmetic expression: 106 - 36
Solution: Luke has
I tried to debug the issue by inserting some print statement, however the error site was too complicated and I can't really follows the code.
However, I did add a print statement and when the error is thrown, the token_pos
was 0 and the sample_token
was repeating the last token of my prompt, which would be removed when called with max_token
(I guess it was removed by token healing?) , see the output in below details.
I added the highlighted line in guidance/models/_local.py
www@8bf11758c665:/var/www/app$ python
Python 3.9.18 (main, Nov 1 2023, 14:31:33)
[GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from guidance import models, gen, select
>>> llama2 = models.LlamaCpp('/mnt/models/mistral-7b-openorca.Q5_K_M.gguf', n_ctx=(1024*4))
>>> lm = llama2 + 'Problem: Luke has a hundred and six balls. He then loses thirty six.\n'
>>> lm += 'Equivalent arithmetic expression: ' + gen(max_tokens=15) + '\n'
None b'Equ'
None b'ivalent'
None b' ar'
None b'ith'
None b'metic'
None b' expression'
None b':'
1 b' '
1 b'1'
1 b'0'
1 b'6'
2 b' -'
1 b' '
1 b'3'
1 b'6'
1 b'\n'
1 b'\n'
1 b'S'
7 b'olution'
1 b':'
5 b' Luke'
4 b' has'
4 b'\n'
0 b'\n'
>>> print(lm)
Problem: Luke has a hundred and six balls. He then loses thirty six.
Equivalent arithmetic expression: 106 - 36
Solution: Luke has
\>>>
\>>> lm = llama2 + 'Problem: Luke has a hundred and six balls. He then loses thirty six.\n'
\>>> lm += 'Equivalent arithmetic expression: ' + gen(stop='\n') + '\n'
None b'Equ'
None b'ivalent'
None b' ar'
None b'ith'
None b'metic'
None b' expression'
None b':'
None b' '
0 b' '
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python3.9/site-packages/guidance/models/_model.py", line 242, in __add__
out = lm._run_stateless(value)
File "/usr/local/lib/python3.9/site-packages/guidance/models/_model.py", line 382, in _run_stateless
for new_bytes, is_generated, new_bytes_log_prob, capture_groups, capture_group_log_probs, new_token_count in gen_obj:
File "/usr/local/lib/python3.9/site-packages/guidance/models/_local.py", line 375, in __call__
assert parser.matched(), "We can't consume any more tokens, but we are not yet done! Perhaps your model's token set is incomplete?"
AssertionError: We can't consume any more tokens, but we are not yet done! Perhaps your model's token set is incomplete?
\>>>
\>>> lm = llama2 + 'Problem: Luke has a hundred and six balls. He then loses thirty six.\n'
\>>> # This time without the quotation marks and space
\>>> lm += 'Equivalent arithmetic expression' + gen(stop='\n') + '\n'
None b'Equ'
None b'ivalent'
None b' ar'
None b'ith'
None b'metic'
None b' expression'
0 b' expression'
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python3.9/site-packages/guidance/models/_model.py", line 242, in __add__
out = lm._run_stateless(value)
File "/usr/local/lib/python3.9/site-packages/guidance/models/_model.py", line 382, in _run_stateless
for new_bytes, is_generated, new_bytes_log_prob, capture_groups, capture_group_log_probs, new_token_count in gen_obj:
File "/usr/local/lib/python3.9/site-packages/guidance/models/_local.py", line 375, in __call__
assert parser.matched(), "We can't consume any more tokens, but we are not yet done! Perhaps your model's token set is incomplete?"
AssertionError: We can't consume any more tokens, but we are not yet done! Perhaps your model's token set is incomplete?
versions:
guidance==0.1.4 llama_cpp_python==0.2.19
@gmonair Sorry to bother you, but after some digging, It started working for me, can you try the following and see if it work for you?
Basically, from what I tried, models that doesn't uses '\' for BOS token and '\' for EOS token will fail, b/c this was hardcoded in guidance. coincidentally this commit changed the related code, and it solved the problem for me. Alternately, you can try using "neural-chat-7b-v3-1.Q5_K_M.gguf", which use the "correct" BOS and EOS token. If it also does it for you, then great.
(FYI, if you load the model directly with llama.cpp or llama-cpp-python it will print the model's information, including the BOS and EOS token used)
However, since @hanszahm isn't using llama.cpp, it probably doesn't directly solve your/their problem. But it might be a similar issue tho.
I can confirm, without llama.cpp, using mixtral instruct gptq
, I get:
Exception: We can't consume any more tokens, but we are not yet done! Perhaps your model's token set is incomplete? This happened after the prompt: ...
In my use case, this seems related to non english chars that are indeed proper unicode, but, not in Mixtral's tokenizer.
The bug When using a Mistral-7B based model, some basic examples work, while the more advanced ones error out. Using a LLama based model works on all examples.
To Reproduce
This snippet works as expected:
expected output, the model correctly answers
7
The following two snippets don't work:
Error:
Error:
Both snippets work when using the llama2 based model
and
System info (please complete the following information):
0.1.1
):