Open sam-cohan opened 1 year ago
That is true, but a chat prompt would also fail on standard completion models, so either way the first example has to break on something :) ...if you think we can improve the error message I am all ears!
I see your point. However, there is a strong incentive to use OpenAI's chat model gpt-3.5-turbo
instead of the text-davinci-003
due to the 10x cost reduction in the chat variant. So, I was thinking, since behind the scenes, the chat models just construct a single prompt from the system
and user
prompts, then perhaps, the framework could also make some attempt to allow this flexibility in that if the user provides a single prompt, then perhaps it just goes into the user
prompt with the system
prompt either being empty, or being something generic. Conversely, if thee user
and system
prompts are provided separately and the model only accepts a single prompt, then perhaps the framework will add them with some appropriate section headings (ideally one that we know to be common to the variants of the provider, but does not have to be).
In this way, we can continue to benefit form Guidance at a cheaper cost. Does the motivation and what I am proposing make sense?
Generally, it seems like missed opportunity to limit the chat functionality so much. I also noticed that the gen
inside assistant
is completely neutered (e.g. does not even support pattern
). I am not sure of the implementation details, but I dont think there is a good fundamental reason why that should be the case?
I was thinking, if there is a good way to override the gen
, then I can easily modify it to do what I need (i.e. make the chat model behave as if it was a regular completion model)
I am not sure I entirely follow but the issue here is a limitation of the OpenAI API, we can't give partial completions yet to the assistant role. This means all generations have to be done as the only subtag inside an assistant role. You can use the above style of prompt with a chat model, but you do have to put in chat tags and respect that limitation :) like this:
import guidance
# set the default language model used to execute guidance programs
guidance.llm = guidance.llms.OpenAI("gpt-3.5-turbo")
# define a guidance program that adapts a proverb
program = guidance("""
{{#system}}You are a helpful agent{{/system}}
{{#user}}Tweak this proverb to apply to model instructions instead.
{{proverb}}
- {{book}} {{chapter}}:{{verse}}
UPDATED
Where there is no guidance{{/user}}
{{#assistant}}{{gen 'rewrite' stop="\\n-"}}{{/assistant}}""")
# execute the program on a specific proverb
executed_program = program(
proverb="Where there is no guidance, a people falls,\nbut in an abundance of counselors there is safety.",
book="Proverbs",
chapter=11,
verse=14
)
Thank you very much for your patience and the example.
Could you possibly explain how I can adapt the json example to the chat model? Here is my attempt:
guidance.llm = guidance.llms.OpenAI("gpt-3.5-turbo")
valid_weapons = ["sword", "axe", "mace", "spear", "bow", "crossbow"]
# define a guidance program that adapts a proverb
program = guidance("""
{{#system}}You are a helpful agent{{/system}}
{{#user}}
Please generate character profile for an RPG game in JSON format.
{{/user}}
{{#assistant}}
```json
{
"id": "{{id}}",
"description": "{{description}}",
"name": "{{gen 'name'}}",
"age": {{gen 'age' pattern='[0-9]+' stop=','}},
"armor": "{{#select 'armor'}}leather{{or}}chainmail{{or}}plate{{/select}}",
"weapon": "{{select 'weapon' options=valid_weapons}}",
"class": "{{gen 'class'}}",
"mantra": "{{gen 'mantra' temperature=0.7}}",
"strength": {{gen 'strength' pattern='[0-9]+' stop=','}},
"items": [{{#geneach 'items' num_iterations=5 join=', '}}"{{gen 'this' temperature=0.7}}"{{/geneach}}]
}```"
{{/assistant}}""")
# execute the program on a specific proverb
executed_program = program(
id="e1f491f7-7ab8-4dac-8c20-c92b5e7d883d",
description="A quick and nimble fighter.",
)
leads to
AssertionError: When calling OpenAI chat models you must generate only directly inside the assistant role! The OpenAI API does not currently support partial assistant prompting.
I was hoping there is a way to tell the framework "here is a chat model but just use it as if it was not", meaning, under the hood, every time there is a gen
, it will get restructured to get generation from the assistant
role (or whatever the model interface requires)... In absence of that, what is the proper way to make the above example work?
To be honest, I think the best approach is what I mentioned before, to write a wrapper to treat the chat model interface as if it is a simple completion interface.
I also am confused by this. The docs are a bit lacking in terms of explaining the error. I'm not sure what partial completion vs other means exactly! How would I use a chat model with this to get structured completions while also taking into account message history? I can always handle the message history manually.
@sam-cohan @slundberg
Yeah, I had a look. There are two potential ways this could be done:
Approach 1.) Tweaking the OpenAI class, so you can use gpt3.5-turbo
as a drop in replacement for davinci.
https://github.com/microsoft/guidance/blob/main/guidance/llms/_openai.py
(Details below)
The main change would be to add a chat_mode called "completion_with_chat".
When this is set, and the user attempts to use a chat-model
for simpler completion. Then on the line with the openai call, you instead convert the prompt
argument, to a single element array for messages
coming in from a user, to simulate conversation history.
You would also drop the logprobs argument which the chat api doesnt accept.
It would only work for the simpler features of guidance, not the ones that need the logprobs. However, you can make the RPG and proverb example work (for instance) using this conversion. We'd probably want to log a warning message about some behaviors maybe being unsupported in this mode.
2.) Write a utility class that rewrites a completion targeted prompt to a chatgpt compatible one. Basically no text between the assistant opening/closing tags, they'd have to be manually pulled out into the previous user message.
If there's interest from the repository maintainers, I can open a PR for either approach over next weekend, but only if they think its okay with their vision for the project.
Thanks for your input @aabdullah-getguru . I was suggesting 1) but yeah I agree it would be useful for the project maintainers to give thumbs up on whether this is ok with them. If not, would love to know what workarounds they suggest...
is there an update on this?
I may be far off the mark, but it seems one benefit of guidance is the re-use of keys/values in cache. I don't see how this is possible if using an api (like openai or anthropic). I would have thought this is only possible if running one's own llm... is that correct?
So, I was confused to see gpt-4 and davinci in some of the examples...
If this is correct, perhaps adding a clarifying sentence to the start of the ReadMe would be of benefit.
AssertionError: When calling OpenAI chat models you must generate only directly inside the assistant role! The OpenAI API does not currently support partial assistant prompting.
This means that you can have only one gen
statement in the assistant role. You cannot have a second gen
or any other statements. Based on my tests:
{{#assistant}}
must be immediately followed by {{gen ... }}
. No spaces, new lines or any characters in between.{{gen ... }}
must be followed by {{/assistant}}
but you can have any plain text, spaces, new lines between them. However, you cannot have any library function like another gen
, a select
, if
, etc.I think the assertion also means that you cannot have any gen
statement in other roles. But I'm not sure of this.
So the chat completions are limited to simple generations in the assistant role. You cannot do structured generations like the JSON @sam-cohan tried.
This is a HUGE bummer because I was able to successfully do structured JSON generations with the chat model using LongChain + Pydantic. So it should be possible and I was looking forward to replicating it in Guidance.
Why do we have this assertion? assert prompt.endswith("<|im_start|>assistant\n"), "When calling OpenAI chat...
@sam-cohan @slundberg
Yeah, I had a look. There are two potential ways this could be done:
Approach 1.) Tweaking the OpenAI class, so you can use
gpt3.5-turbo
as a drop in replacement for davinci. https://github.com/microsoft/guidance/blob/main/guidance/llms/_openai.py(Details below)
- The main change would be to add a chat_mode called "completion_with_chat".
- When this is set, and the user attempts to use a
chat-model
for simpler completion. Then on the line with the openai call, you instead convert theprompt
argument, to a single element array formessages
coming in from a user, to simulate conversation history.- You would also drop the logprobs argument which the chat api doesnt accept.
- It would only work for the simpler features of guidance, not the ones that need the logprobs. However, you can make the RPG and proverb example work (for instance) using this conversion. We'd probably want to log a warning message about some behaviors maybe being unsupported in this mode.
2.) Write a utility class that rewrites a completion targeted prompt to a chatgpt compatible one. Basically no text between the assistant opening/closing tags, they'd have to be manually pulled out into the previous user message.
If there's interest from the repository maintainers, I can open a PR for either approach over next weekend, but only if they think its okay with their vision for the project.
Could you elaborate more on this, please? I have only access to gpt3.5-turbo
and need it to work on the RPG example. AFAIK in chat mode, only one gen
statement is allowed inside the assistant role, such that one would can only generate a value (gen
statement) per key. In order to fill a whole JSON I would need to add the key to the JSON and place the gen
statement where its value should be generated, then after getting that result, update the json and append the next key and gen
statement, repeat until done?
any support or updates for the newest guidance version?
any updates on this one please ?
The bug OpenAI chat models not compatible with the basic first example.
To Reproduce Give a full working code snippet that can be pasted into a notebook cell or python file. Make sure to include the LLM load step so we know which model you are using.
Output:
System info (please complete the following information):
guidance.__version__
): '0.0.56'