guidance-ai / guidance

A guidance language for controlling large language models.
MIT License
18.13k stars 1.01k forks source link

[WIP] Structured model state #929

Open hudson-ai opened 1 week ago

hudson-ai commented 1 week ago

This PR primarily replaces the implementation of model state (Model._state) with a list of python objects. Currently, state is a str with embedded "tags" that represent special embedded objects, e.g. html for formatting, images for multimodal models, etc.

On the one hand, the existing implementation is "nice" because it allows adding these special objects as strings inside of normal (and even stateless) guidance functions without having do anything "dirty" like reach into private methods on Model -- everything is string concatenation.

On the other hand, this means that this string has to be "parsed" to produce actual prompts (usually pretty straight-forwardly via re.replace), introspect on token probabilities, or extract image data from tags. This feels fragile as a model could easily produce strings that correspond to our tags and blow everything up. Furthermore, if we ever have a guidance function that produces "special" strings with formatting, etc. inside of a select block, we're actually asking the model to produce this special structure instead of just the actual textual content...

A few TODOS:

Thank you @nking-1 for the help and feedback on this so far! Note that this train of thought originated from the discussion on PR #905 for adding terminal colors. This isn't technically a blocker for that PR, but adding structured state will make that PR trivial :)

paulbkoch commented 4 days ago

Hi @hudson-ai, the one concern I have with moving away from purely text based state is that I think there may be useful scenarios where the guidance developer would want to use the grammar to switch roles. For example, if the prompt is currently in an assistant role, the guidance program might switch to the user role, include some user role text, then switch back to the assistant role and then force the assistant to write some initial text in the response. Is this still a possible scenario to implement under this PR? I do recognize there are benefits to disallowing the user from creating illegal inputs, but I'm curious to explore the tradeoff here, if any.

hudson-ai commented 1 day ago

Hi @hudson-ai, the one concern I have with moving away from purely text based state is that I think there may be useful scenarios where the guidance developer would want to use the grammar to switch roles. For example, if the prompt is currently in an assistant role, the guidance program might switch to the user role, include some user role text, then switch back to the assistant role and then force the assistant to write some initial text in the response. Is this still a possible scenario to implement under this PR? I do recognize there are benefits to disallowing the user from creating illegal inputs, but I'm curious to explore the tradeoff here, if any.

Thanks for taking a look @paulbkoch :)

I have to think about this a bit, but my first impression is that there is nothing about structured model state that would prohibit special control sequences from the models themselves. Just thinking out loud...

The idea is just to make a logical distinction between the way that we do internal bookkeeping and how the contents of that bookkeeping are displayed to models, to ipython for formatting, etc.

Is this reasonable?