[WIP] Structured model state

hudson-ai commented 1 week ago

This PR primarily replaces the implementation of model state (Model._state) with a list of python objects. Currently, state is a str with embedded "tags" that represent special embedded objects, e.g. html for formatting, images for multimodal models, etc.

On the one hand, the existing implementation is "nice" because it allows adding these special objects as strings inside of normal (and even stateless) guidance functions without having do anything "dirty" like reach into private methods on Model -- everything is string concatenation.

On the other hand, this means that this string has to be "parsed" to produce actual prompts (usually pretty straight-forwardly via re.replace), introspect on token probabilities, or extract image data from tags. This feels fragile as a model could easily produce strings that correspond to our tags and blow everything up. Furthermore, if we ever have a guidance function that produces "special" strings with formatting, etc. inside of a select block, we're actually asking the model to produce this special structure instead of just the actual textual content...

A few TODOS:

[ ] Rethink formatting-only context blocks like silent and monospace
[ ] See if this approach simplifies the implementation of multimodal models (@nking-1)
[ ] See if this approach simplifies chat engines
- e.g. have a method on ModelState that turns it into a list of {"role": "user", "message": ...} objects

Thank you @nking-1 for the help and feedback on this so far! Note that this train of thought originated from the discussion on PR #905 for adding terminal colors. This isn't technically a blocker for that PR, but adding structured state will make that PR trivial :)

paulbkoch commented 4 days ago

Hi @hudson-ai, the one concern I have with moving away from purely text based state is that I think there may be useful scenarios where the guidance developer would want to use the grammar to switch roles. For example, if the prompt is currently in an assistant role, the guidance program might switch to the user role, include some user role text, then switch back to the assistant role and then force the assistant to write some initial text in the response. Is this still a possible scenario to implement under this PR? I do recognize there are benefits to disallowing the user from creating illegal inputs, but I'm curious to explore the tradeoff here, if any.

hudson-ai commented 1 day ago

Hi @hudson-ai, the one concern I have with moving away from purely text based state is that I think there may be useful scenarios where the guidance developer would want to use the grammar to switch roles. For example, if the prompt is currently in an assistant role, the guidance program might switch to the user role, include some user role text, then switch back to the assistant role and then force the assistant to write some initial text in the response. Is this still a possible scenario to implement under this PR? I do recognize there are benefits to disallowing the user from creating illegal inputs, but I'm curious to explore the tradeoff here, if any.

Thanks for taking a look @paulbkoch :)

I have to think about this a bit, but my first impression is that there is nothing about structured model state that would prohibit special control sequences from the models themselves. Just thinking out loud...

internal model state can contain extra data that isn't used for LLM prompting, e.g. probabilities of generated tokens, formatting information, etc.
models are allowed to produce control sequences that add special data to the model state (e.g. start/end of roles)
that being said, but models should never be able to write that extra data directly (i.e. they have no control over the probability we log for a token or how we format role blocks, they only have control over whether or not a role block has begun, etc.)

The idea is just to make a logical distinction between the way that we do internal bookkeeping and how the contents of that bookkeeping are displayed to models, to ipython for formatting, etc.

Is this reasonable?

guidance-ai / guidance

[WIP] Structured model state #929