Regression in 0.1.15 Causes Incorrect Token Slicing in 'Role' Blocks

FoxBuchele commented 1 month ago

The bug A regression was introduced in version 0.1.15 of the Guidance Library. Within 'role' blocks, the responses stored via 'capture' are missing tokens at the beginning and include extra tokens at the end, indicating incorrect slicing of the output. This issue does not occur in version 0.1.14.

To Reproduce Give a full working code snippet that can be pasted into a notebook cell or python file. Make sure to include the LLM load step so we know which model you are using.

import os

import guidance
from guidance import models, user, capture

relpath = ".\llmdata~\mistral-7b-instruct-v0.2.Q4_K_M.gguf"

fullpath = os.path.abspath(relpath)
if not os.path.isfile(fullpath):
    print("ERROR! The model file is not at "+relpath+" and we tried to create the absolute path: "+fullpath)

# 0.1.14
# Works as expected! 
#guidance_lm = models.MistralChat(fullpath, n_gpu_layers=-1, temperature=0.7, max_tokens=8194, n_batch=8194, top_p=0.95, n_ctx=8194, verbose=True, echo=False)

# 0.1.15
# Broken output (includes unwanted stop tokens, missing tokens at the beginning equal to number of extra stop tokens at end)
guidance_lm = models.LlamaCpp(fullpath, n_gpu_layers=-1, temperature=0.7, max_tokens=8194, n_batch=8194, top_p=0.95, n_ctx=8194, verbose=True, echo=False)

def tokenizer_issue():
    TokenizerTest = guidance_lm + capture("""This is a test of something we're going to move from place to place.""", "remember")
    alt_string = TokenizerTest["remember"]
    with user():
        test_lm =  guidance_lm + capture(TokenizerTest["remember"], "output")
        test_two = guidance_lm + capture(alt_string, "output_two")
        #Also occurs when NOT capturing LM state...
        test_three = guidance_lm + capture("This is an example to show that this issue only occurs within the role blocks...","output_three")

    confirm_lm = guidance_lm + capture(TokenizerTest["remember"], "finale")

    # Prints normally
    print("Works at start:")
    print(TokenizerTest["remember"])
    # Missing beginning, extra output characters that shouldn't be included at the end
    print("First error:")
    print(test_lm["output"])
    print("Second error:")
    print(test_two["output_two"])
    print("Third error:")
    print(test_three["output_three"])
    # Prints normally
    print("Works:")
    print(confirm_lm["finale"])

    #0.1.14 output:
    # Running LM Test...
    # ---------
    # Works at start:
    # This is a test of something we're going to move from place to place.
    # First error:
    # This is a test of something we're going to move from place to place.
    # Second error:
    # This is a test of something we're going to move from place to place.
    # Third error:
    # This is an example to show that this issue only occurs within the role blocks...
    # Works:
    # This is a test of something we're going to move from place to place.

    #0.1.15 output:
    # Running LM Test...
    # ---------
    # Works at start:
    # This is a test of something we're going to move from place to place.
    # First error:
    # a test of something we're going to move from place to place. [/INST]
    # Second error:
    # a test of something we're going to move from place to place. [/INST]
    # Third error:
    # an example to show that this issue only occurs within the role blocks... [/INST]
    # Works:
    # This is a test of something we're going to move from place to place.

if __name__ == "__main__":
    print("Running LM Test... ")
    print("---------")
    tokenizer_issue()

System info (please complete the following information):

OS: Windows 11
Guidance Version (guidance.__version__): 0.1.15

FoxBuchele commented 1 month ago

Looking further into it, this appears to be an issue only with the capture() function when used on text - using capture on functions or grammar does not appear to cause this issue in most circumstances.

E.g. this functions as expected:

    gen_test = guidance_lm + "Please repeat the following sentence: 'The quick brown fox jumped over the lazy dog.'"

    with user():
        gen_test += capture(gen(),"response")

    print(gen_test["response"])
    # Prints 'The quick brown fox jumped over the lazy dog.' correctly, no missing tokens or incorrect appended tokens.

FoxBuchele commented 1 month ago

~~I believe I have fixed this issue with pull request 858:~~ [Edit: It was not, in fact, fixed.]

https://github.com/guidance-ai/guidance/pull/858

I would definitely appreciate some eyes on this from folks a bit more involved in the code base - it's possible this was done like this for many reasons I don't have insight into.

FoxBuchele commented 1 month ago

Updated my pull request with a much more robust solution, which adapted some functionality from _model.py. It might be a better idea to share the functionality (or at least the regex) between both files?

guidance-ai / guidance

Regression in 0.1.15 Causes Incorrect Token Slicing in 'Role' Blocks #857