Open FoxBuchele opened 1 month ago
Looking further into it, this appears to be an issue only with the capture() function when used on text - using capture on functions or grammar does not appear to cause this issue in most circumstances.
E.g. this functions as expected:
gen_test = guidance_lm + "Please repeat the following sentence: 'The quick brown fox jumped over the lazy dog.'"
with user():
gen_test += capture(gen(),"response")
print(gen_test["response"])
# Prints 'The quick brown fox jumped over the lazy dog.' correctly, no missing tokens or incorrect appended tokens.
I believe I have fixed this issue with pull request 858: [Edit: It was not, in fact, fixed.]
https://github.com/guidance-ai/guidance/pull/858
I would definitely appreciate some eyes on this from folks a bit more involved in the code base - it's possible this was done like this for many reasons I don't have insight into.
Updated my pull request with a much more robust solution, which adapted some functionality from _model.py. It might be a better idea to share the functionality (or at least the regex) between both files?
The bug A regression was introduced in version 0.1.15 of the Guidance Library. Within 'role' blocks, the responses stored via 'capture' are missing tokens at the beginning and include extra tokens at the end, indicating incorrect slicing of the output. This issue does not occur in version 0.1.14.
To Reproduce Give a full working code snippet that can be pasted into a notebook cell or python file. Make sure to include the LLM load step so we know which model you are using.
System info (please complete the following information):
guidance.__version__
): 0.1.15