`usize` overflow in `num_tokens` when completing a `Gen` with nonempty `stop_regex`

hudson-ai commented 2 months ago

When completing a Gen via stop_regex (I can't seem to reproduce when it's empty/completing with EOS), the final 'text' object in the progress list takes on a value of 2**64.

import json
from guidance.models._byte_tokenizer import ByteTokenizer
from guidance._grammar import Gen
from llguidance import LLInterpreter, LLTokenizer, TokenizerWrapper

g = Gen(body_regex=r".*", stop_regex=r"bar", max_tokens=10)

interp = LLInterpreter(
    LLTokenizer(TokenizerWrapper(ByteTokenizer())),
    json.dumps(g.ll_serialize())
)

s = "foobar"
i = 0
interp.process_prompt([])
while True:
    mask, resp = interp.mid_process()
    resp = json.loads(resp)
    print(resp)
    if resp["stop"]:
        break

    if mask is not None:
        tok = ord(s[i])
        i += 1
    else:
        tok = None

    bt, toks = interp.post_process(tok)

stdout:

{'progress': [{'object': 'text', 'str': '', 'hex': '', 'log_prob': 0.0, 'num_tokens': 0, 'is_generated': False, 'stats': {'runtime_us': 139, 'rows': 0, 'definitive_bytes': 0, 'lexer_ops': 260, 'all_items': 0, 'hidden_bytes': 0}}], 'stop': False, 'temperature': 0.0}
{'progress': [{'object': 'text', 'str': 'f', 'hex': '66', 'log_prob': 0.0, 'num_tokens': 1, 'is_generated': True, 'stats': {'runtime_us': 7, 'rows': 0, 'definitive_bytes': 1, 'lexer_ops': 258, 'all_items': 0, 'hidden_bytes': 0}}], 'stop': False, 'temperature': 0.0}
{'progress': [{'object': 'text', 'str': 'o', 'hex': '6f', 'log_prob': 0.0, 'num_tokens': 1, 'is_generated': True, 'stats': {'runtime_us': 4, 'rows': 0, 'definitive_bytes': 1, 'lexer_ops': 258, 'all_items': 0, 'hidden_bytes': 0}}], 'stop': False, 'temperature': 0.0}
{'progress': [{'object': 'text', 'str': 'o', 'hex': '6f', 'log_prob': 0.0, 'num_tokens': 1, 'is_generated': True, 'stats': {'runtime_us': 4, 'rows': 0, 'definitive_bytes': 1, 'lexer_ops': 258, 'all_items': 0, 'hidden_bytes': 0}}], 'stop': False, 'temperature': 0.0}
{'progress': [{'object': 'text', 'str': '', 'hex': '', 'log_prob': 0.0, 'num_tokens': 1, 'is_generated': True, 'stats': {'runtime_us': 30, 'rows': 0, 'definitive_bytes': 1, 'lexer_ops': 258, 'all_items': 0, 'hidden_bytes': 0}}], 'stop': False, 'temperature': 0.0}
{'progress': [{'object': 'text', 'str': '', 'hex': '', 'log_prob': 0.0, 'num_tokens': 1, 'is_generated': True, 'stats': {'runtime_us': 29, 'rows': 1, 'definitive_bytes': 1, 'lexer_ops': 258, 'all_items': 1, 'hidden_bytes': 0}}], 'stop': False, 'temperature': 0.0}
{'progress': [{'object': 'text', 'str': '', 'hex': '', 'log_prob': 0.0, 'num_tokens': 1, 'is_generated': True, 'stats': {'runtime_us': 8, 'rows': 1, 'definitive_bytes': 1, 'lexer_ops': 0, 'all_items': 1, 'hidden_bytes': 0}}], 'stop': False, 'temperature': 0.0}
{'progress': [{'object': 'text', 'str': '', 'hex': '', 'log_prob': 0.0, 'num_tokens': 18446744073709551613, 'is_generated': False, 'stats': {'runtime_us': 7, 'rows': 0, 'definitive_bytes': 0, 'lexer_ops': 256, 'all_items': 0, 'hidden_bytes': 0}}, {'object': 'final_text', 'str': 'foo', 'hex': '666f6f', 'stop_reason': 'NoExtension'}], 'stop': True, 'temperature': 0.0}

Bonus issue: see that we get a token every time, even though "bar" is hidden...

hudson-ai commented 2 months ago

@mmoskal tagging this one as relatively high priority as it directly affects user experience and prevents follow-up generation (we immediately hit the token limit...)

mmoskal commented 2 months ago

as for the bonus - a token is generated, so I think if you have "max_tokens" set somewhere, you want to see that this happened (even if the token happens to be hidden)

hudson-ai commented 2 months ago

as for the bonus - a token is generated, so I think if you have "max_tokens" set somewhere, you want to see that this happened (even if the token happens to be hidden)

totally fair!

microsoft / llguidance

`usize` overflow in `num_tokens` when completing a `Gen` with nonempty `stop_regex` #6