guidance-ai / guidance

A guidance language for controlling large language models.
MIT License
18.91k stars 1.04k forks source link

Stream() does not obey the roles function #943

Open peterwilli opened 3 months ago

peterwilli commented 3 months ago

The bug

When using roles such as with assistant(): combined with stream(), roles are not obeyed and all output is being squished into the last role...

To Reproduce Give a full working code snippet that can be pasted into a notebook cell or python file. Make sure to include the LLM load step so we know which model you are using.

from guidance import models, gen
from guidance import user, system, assistant

model = models.LlamaCpp(
    "./models/any_model.gguf",
    n_gpu_layers=-1,
    temperature=0,
    n_ctx=8192
)

lm = model.stream()
with user():
    lm += "Hey there!"

with assistant():
    lm += gen()
    for token in lm:
        print(token)

What I get:

<|start_header_id|>assistant<|end_header_id>

Hey there! How's it going?<|eot_id|>
<|start_header_id|>assistant<|end_header_id>

Hey there! How's it going? What's on your mind? Do you have<|eot_id|>
<|start_header_id|>assistant<|end_header_id>

Hey there! How's it going? What's on your mind? Do you have any questions or topics you'd like to discuss<|eot_id|>
<|start_header_id|>assistant<|end_header_id>

Hey there! How's it going? What's on your mind? Do you have any questions or topics you'd like to discuss? I'm here to help and provide information<|eot_id|>
<|start_header_id|>assistant<|end_header_id>

Note that the output is all under assistant, and that the sentence being written is a completion of what should have been the user() role.

System info (please complete the following information):

(Temporary) workaround

I found a way around this after reading the source code and attempting to fix it (I couldn't find a way to do it without breaking too much things, so I wait for a dev with more experience for a final fix) - it's not the cleanest but it'll do for now:

from guidance import models, gen

model = models.LlamaCpp(
    "./models/any_model.gguf",
    n_gpu_layers=-1,
    temperature=0,
    n_ctx=8192
)

def wrap_role_start(lm):
    for block in models.LlamaCpp.open_blocks.keys():
        lm += block.opener
    return lm

def wrap_role_end(lm):
    for block in models.LlamaCpp.open_blocks.keys():
        lm += block.closer
    return lm

lm = model.stream()
with user():
    lm = wrap_role_start(lm)
    lm += "Hey there!"
    lm = wrap_role_end(lm)

with assistant():
    lm = wrap_role_start(lm)
    lm += gen()
    lm = wrap_role_end(lm)
for token in lm:
    print(token)

Note that the token iteration is outside of any roles, this is important for the workaround to work.

hudson-ai commented 3 months ago

@peterwilli thank you for identifying this issue and providing a workaround! It would be nice if we could do this a bit more "automatically" for our users -- I think that the context-manager implementation needs some attention... will continue thinking about this :)

@nking-1 tagging you for interest

peterwilli commented 2 months ago

Thank you! Yeah! That was exactly where my thoughts went, but I wasn't experienced enough with your source code to make such change to the context manager. I decided to settle for this workaround for now, but of course automating this, so behavior is the same for stream and not stream is much better