Basic example broken with chat-type OpenAI models

sam-cohan commented 1 year ago

The bug OpenAI chat models not compatible with the basic first example.

To Reproduce Give a full working code snippet that can be pasted into a notebook cell or python file. Make sure to include the LLM load step so we know which model you are using.

import guidance

# set the default language model used to execute guidance programs
guidance.llm = guidance.llms.OpenAI("gpt-3.5-turbo")

# define a guidance program that adapts a proverb
program = guidance("""Tweak this proverb to apply to model instructions instead.

{{proverb}}
- {{book}} {{chapter}}:{{verse}}

UPDATED
Where there is no guidance{{gen 'rewrite' stop="\\n-"}}
- GPT {{gen 'chapter'}}:{{gen 'verse'}}""")

# execute the program on a specific proverb
executed_program = program(
    proverb="Where there is no guidance, a people falls,\nbut in an abundance of counselors there is safety.",
    book="Proverbs",
    chapter=11,
    verse=14
)

Output:

Tweak this proverb to apply to model instructions instead.

Where there is no guidance, a people falls,
but in an abundance of counselors there is safety.
- Proverbs 11:14

UPDATED
Where there is no guidance
Traceback (most recent call last):
  File "/home/ec2-user/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/site-packages/guidance/_program_executor.py", line 94, in run
    await self.visit(self.parse_tree)
  File "/home/ec2-user/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/site-packages/guidance/_program_executor.py", line 429, in visit
    visited_children.append(await self.visit(child, inner_next_node, inner_next_next_node, inner_prev_node, node, parent_node))
  File "/home/ec2-user/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/site-packages/guidance/_program_executor.py", line 429, in visit
    visited_children.append(await self.visit(child, inner_next_node, inner_next_next_node, inner_prev_node, node, parent_node))
  File "/home/ec2-user/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/site-packages/guidance/_program_executor.py", line 218, in visit
    visited_children = [await self.visit(child, next_node, next_next_node, prev_node, node, parent_node) for child in node.children]
  File "/home/ec2-user/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/site-packages/guidance/_program_executor.py", line 218, in <listcomp>
    visited_children = [await self.visit(child, next_node, next_next_node, prev_node, node, parent_node) for child in node.children]
  File "/home/ec2-user/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/site-packages/guidance/_program_executor.py", line 429, in visit
    visited_children.append(await self.visit(child, inner_next_node, inner_next_next_node, inner_prev_node, node, parent_node))
  File "/home/ec2-user/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/site-packages/guidance/_program_executor.py", line 292, in visit
    command_output = await command_function(*positional_args, **named_args)
  File "/home/ec2-user/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/site-packages/guidance/library/_gen.py", line 137, in gen
    gen_obj = await parser.llm_session(
  File "/home/ec2-user/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/site-packages/guidance/llms/_openai.py", line 520, in __call__
    out = self.llm.caller(**call_args)
  File "/home/ec2-user/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/site-packages/guidance/llms/_openai.py", line 307, in _library_call
    kwargs['messages'] = prompt_to_messages(kwargs['prompt'])
  File "/home/ec2-user/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/site-packages/guidance/llms/_openai.py", line 21, in prompt_to_messages
    assert prompt.endswith("<|im_start|>assistant\n"), "When calling OpenAI chat models you must generate only directly inside the assistant role! The OpenAI API does not currently support partial assistant prompting."
AssertionError: When calling OpenAI chat models you must generate only directly inside the assistant role! The OpenAI API does not currently support partial assistant prompting.

Error in program:  When calling OpenAI chat models you must generate only directly inside the assistant role! The OpenAI API does not currently support partial assistant prompting.
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[6], line 17
      7 program = guidance("""Tweak this proverb to apply to model instructions instead.
      8 
      9 {{proverb}}
   (...)
     13 Where there is no guidance{{gen 'rewrite' stop="\\n-"}}
     14 - GPT {{gen 'chapter'}}:{{gen 'verse'}}""")
     16 # execute the program on a specific proverb
---> 17 executed_program = program(
     18     proverb="Where there is no guidance, a people falls,\nbut in an abundance of counselors there is safety.",
     19     book="Proverbs",
     20     chapter=11,
     21     verse=14
     22 )

File ~/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/site-packages/guidance/_program.py:233, in Program.__call__(self, **kwargs)
    231         return self._stream_run(loop, new_program)
    232     else:
--> 233         loop.run_until_complete(new_program.execute())
    235 return new_program

File ~/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/site-packages/nest_asyncio.py:90, in _patch_loop.<locals>.run_until_complete(self, future)
     87 if not f.done():
     88     raise RuntimeError(
     89         'Event loop stopped before Future completed.')
---> 90 return f.result()

File ~/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/asyncio/futures.py:201, in Future.result(self)
    199 self.__log_traceback = False
    200 if self._exception is not None:
--> 201     raise self._exception
    202 return self._result

File ~/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/asyncio/tasks.py:256, in Task.__step(***failed resolving arguments***)
    252 try:
    253     if exc is None:
    254         # We use the `send` method directly, because coroutines
    255         # don't have `__iter__` and `__next__` methods.
--> 256         result = coro.send(None)
    257     else:
    258         result = coro.throw(exc)

File ~/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/site-packages/guidance/_program.py:384, in Program.execute(self)
    382 else:
    383     with self.llm.session(asynchronous=True) as llm_session:
--> 384         await self._executor.run(llm_session)
    385 self._text = self._executor.prefix
    387 # delete the executor and so mark the program as not executing

File ~/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/site-packages/guidance/_program_executor.py:98, in ProgramExecutor.run(self, llm_session)
     96 print(traceback.format_exc())
     97 print("Error in program: ", e)
---> 98 raise e

File ~/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/site-packages/guidance/_program_executor.py:94, in ProgramExecutor.run(self, llm_session)
     88 self.llm_session = llm_session
     89 try:
     90     # first parse all the whitespace control
     91     # self.whitespace_control_visit(self.parse_tree)
     92 
     93     # now execute the program
---> 94     await self.visit(self.parse_tree)
     95 except Exception as e:
     96     print(traceback.format_exc())

File ~/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/site-packages/guidance/_program_executor.py:429, in ProgramExecutor.visit(self, node, next_node, next_next_node, prev_node, parent_node, grandparent_node)
    427     else:
    428         inner_prev_node = prev_node
--> 429     visited_children.append(await self.visit(child, inner_next_node, inner_next_next_node, inner_prev_node, node, parent_node))
    430 # visited_children = [self.visit(child) for child in node.children]
    432 if len(visited_children) == 1:

File ~/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/site-packages/guidance/_program_executor.py:429, in ProgramExecutor.visit(self, node, next_node, next_next_node, prev_node, parent_node, grandparent_node)
    427     else:
    428         inner_prev_node = prev_node
--> 429     visited_children.append(await self.visit(child, inner_next_node, inner_next_next_node, inner_prev_node, node, parent_node))
    430 # visited_children = [self.visit(child) for child in node.children]
    432 if len(visited_children) == 1:

File ~/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/site-packages/guidance/_program_executor.py:218, in ProgramExecutor.visit(self, node, next_node, next_next_node, prev_node, parent_node, grandparent_node)
    216 # visit our children
    217 self.block_content.append([])
--> 218 visited_children = [await self.visit(child, next_node, next_next_node, prev_node, node, parent_node) for child in node.children]
    219 self.block_content.pop()
    220 out = "".join("" if c is None else str(c) for c in visited_children)

File ~/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/site-packages/guidance/_program_executor.py:218, in <listcomp>(.0)
    216 # visit our children
    217 self.block_content.append([])
--> 218 visited_children = [await self.visit(child, next_node, next_next_node, prev_node, node, parent_node) for child in node.children]
    219 self.block_content.pop()
    220 out = "".join("" if c is None else str(c) for c in visited_children)

File ~/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/site-packages/guidance/_program_executor.py:429, in ProgramExecutor.visit(self, node, next_node, next_next_node, prev_node, parent_node, grandparent_node)
    427     else:
    428         inner_prev_node = prev_node
--> 429     visited_children.append(await self.visit(child, inner_next_node, inner_next_next_node, inner_prev_node, node, parent_node))
    430 # visited_children = [self.visit(child) for child in node.children]
    432 if len(visited_children) == 1:

File ~/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/site-packages/guidance/_program_executor.py:292, in ProgramExecutor.visit(self, node, next_node, next_next_node, prev_node, parent_node, grandparent_node)
    290 if inspect.iscoroutinefunction(command_function):
    291     await asyncio.sleep(0) # give other coroutines a chance to run
--> 292     command_output = await command_function(*positional_args, **named_args)
    293 else:
    294     command_output = command_function(*positional_args, **named_args)

File ~/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/site-packages/guidance/library/_gen.py:137, in gen(name, stop, stop_regex, save_stop_text, max_tokens, n, stream, temperature, top_p, logprobs, pattern, hidden, list_append, save_prompt, token_healing, _parser_context)
    134 assert parser.llm_session is not None, "You must set an LLM for the program to use (use the `llm=` parameter) before you can use the `gen` command."
    136 # call the LLM
--> 137 gen_obj = await parser.llm_session(
    138     parser_prefix+prefix, stop=stop, stop_regex=stop_regex, max_tokens=max_tokens, n=n, pattern=pattern,
    139     temperature=temperature, top_p=top_p, logprobs=logprobs, cache_seed=cache_seed, token_healing=token_healing,
    140     echo=parser.program.logprobs is not None, stream=stream, caching=parser.program.caching
    141 )
    143 if n == 1:
    144     generated_value = prefix

File ~/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/site-packages/guidance/llms/_openai.py:520, in OpenAISession.__call__(self, prompt, stop, stop_regex, temperature, n, max_tokens, logprobs, top_p, echo, logit_bias, token_healing, pattern, stream, cache_seed, caching)
    518     if logit_bias is not None:
    519         call_args["logit_bias"] = {str(k): v for k,v in logit_bias.items()} # convert keys to strings since that's the open ai api's format
--> 520     out = self.llm.caller(**call_args)
    522 except openai.error.RateLimitError:
    523     await asyncio.sleep(3)

File ~/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/site-packages/guidance/llms/_openai.py:307, in OpenAI._library_call(self, **kwargs)
    304 assert openai.api_key is not None, "You must provide an OpenAI API key to use the OpenAI LLM. Either pass it in the constructor, set the OPENAI_API_KEY environment variable, or create the file ~/.openai_api_key with your key in it."
    306 if self.chat_mode:
--> 307     kwargs['messages'] = prompt_to_messages(kwargs['prompt'])
    308     del kwargs['prompt']
    309     del kwargs['echo']

File ~/SageMaker/custom-miniconda/miniconda/envs/custom_python39/lib/python3.9/site-packages/guidance/llms/_openai.py:21, in prompt_to_messages(prompt)
     18 def prompt_to_messages(prompt):
     19     messages = []
---> 21     assert prompt.endswith("<|im_start|>assistant\n"), "When calling OpenAI chat models you must generate only directly inside the assistant role! The OpenAI API does not currently support partial assistant prompting."
     23     pattern = r'<\|im_start\|>(\w+)(.*?)(?=<\|im_end\|>|$)'
     24     matches = re.findall(pattern, prompt, re.DOTALL)

AssertionError: When calling OpenAI chat models you must generate only directly inside the assistant role! The OpenAI API does not currently support partial assistant prompting.

System info (please complete the following information):

OS (e.g. Ubuntu, Windows 11, Mac OS, etc.): Mac OSX
Guidance Version (guidance.__version__): '0.0.56'

slundberg commented 1 year ago

That is true, but a chat prompt would also fail on standard completion models, so either way the first example has to break on something :) ...if you think we can improve the error message I am all ears!

sam-cohan commented 1 year ago

I see your point. However, there is a strong incentive to use OpenAI's chat model gpt-3.5-turbo instead of the text-davinci-003 due to the 10x cost reduction in the chat variant. So, I was thinking, since behind the scenes, the chat models just construct a single prompt from the system and user prompts, then perhaps, the framework could also make some attempt to allow this flexibility in that if the user provides a single prompt, then perhaps it just goes into the user prompt with the system prompt either being empty, or being something generic. Conversely, if thee user and system prompts are provided separately and the model only accepts a single prompt, then perhaps the framework will add them with some appropriate section headings (ideally one that we know to be common to the variants of the provider, but does not have to be).

In this way, we can continue to benefit form Guidance at a cheaper cost. Does the motivation and what I am proposing make sense?

sam-cohan commented 1 year ago

Generally, it seems like missed opportunity to limit the chat functionality so much. I also noticed that the gen inside assistant is completely neutered (e.g. does not even support pattern). I am not sure of the implementation details, but I dont think there is a good fundamental reason why that should be the case?

sam-cohan commented 1 year ago

I was thinking, if there is a good way to override the gen, then I can easily modify it to do what I need (i.e. make the chat model behave as if it was a regular completion model)

slundberg commented 1 year ago

I am not sure I entirely follow but the issue here is a limitation of the OpenAI API, we can't give partial completions yet to the assistant role. This means all generations have to be done as the only subtag inside an assistant role. You can use the above style of prompt with a chat model, but you do have to put in chat tags and respect that limitation :) like this:

import guidance

# set the default language model used to execute guidance programs
guidance.llm = guidance.llms.OpenAI("gpt-3.5-turbo")

# define a guidance program that adapts a proverb
program = guidance("""
{{#system}}You are a helpful agent{{/system}}

{{#user}}Tweak this proverb to apply to model instructions instead.

{{proverb}}
- {{book}} {{chapter}}:{{verse}}

UPDATED
Where there is no guidance{{/user}}

{{#assistant}}{{gen 'rewrite' stop="\\n-"}}{{/assistant}}""")

# execute the program on a specific proverb
executed_program = program(
    proverb="Where there is no guidance, a people falls,\nbut in an abundance of counselors there is safety.",
    book="Proverbs",
    chapter=11,
    verse=14
)

sam-cohan commented 1 year ago

Thank you very much for your patience and the example.

Could you possibly explain how I can adapt the json example to the chat model? Here is my attempt:

guidance.llm = guidance.llms.OpenAI("gpt-3.5-turbo")

valid_weapons = ["sword", "axe", "mace", "spear", "bow", "crossbow"]

# define a guidance program that adapts a proverb
program = guidance("""
{{#system}}You are a helpful agent{{/system}}

{{#user}}
Please generate character profile for an RPG game in JSON format.
{{/user}}

{{#assistant}}

```json
{
    "id": "{{id}}",
    "description": "{{description}}",
    "name": "{{gen 'name'}}",
    "age": {{gen 'age' pattern='[0-9]+' stop=','}},
    "armor": "{{#select 'armor'}}leather{{or}}chainmail{{or}}plate{{/select}}",
    "weapon": "{{select 'weapon' options=valid_weapons}}",
    "class": "{{gen 'class'}}",
    "mantra": "{{gen 'mantra' temperature=0.7}}",
    "strength": {{gen 'strength' pattern='[0-9]+' stop=','}},
    "items": [{{#geneach 'items' num_iterations=5 join=', '}}"{{gen 'this' temperature=0.7}}"{{/geneach}}]
}```"

{{/assistant}}""")

# execute the program on a specific proverb
executed_program = program(
    id="e1f491f7-7ab8-4dac-8c20-c92b5e7d883d",
    description="A quick and nimble fighter.",
)

leads to

AssertionError: When calling OpenAI chat models you must generate only directly inside the assistant role! The OpenAI API does not currently support partial assistant prompting.

I was hoping there is a way to tell the framework "here is a chat model but just use it as if it was not", meaning, under the hood, every time there is a gen, it will get restructured to get generation from the assistant role (or whatever the model interface requires)... In absence of that, what is the proper way to make the above example work?

sam-cohan commented 1 year ago

To be honest, I think the best approach is what I mentioned before, to write a wrapper to treat the chat model interface as if it is a simple completion interface.

RamiAwar commented 1 year ago

I also am confused by this. The docs are a bit lacking in terms of explaining the error. I'm not sure what partial completion vs other means exactly! How would I use a chat model with this to get structured completions while also taking into account message history? I can always handle the message history manually.

aabdullah-getguru commented 1 year ago

@sam-cohan @slundberg

Yeah, I had a look. There are two potential ways this could be done:

Approach 1.) Tweaking the OpenAI class, so you can use gpt3.5-turbo as a drop in replacement for davinci. https://github.com/microsoft/guidance/blob/main/guidance/llms/_openai.py

(Details below)

The main change would be to add a chat_mode called "completion_with_chat".
When this is set, and the user attempts to use a chat-model for simpler completion. Then on the line with the openai call, you instead convert the prompt argument, to a single element array for messages coming in from a user, to simulate conversation history.
You would also drop the logprobs argument which the chat api doesnt accept.
It would only work for the simpler features of guidance, not the ones that need the logprobs. However, you can make the RPG and proverb example work (for instance) using this conversion. We'd probably want to log a warning message about some behaviors maybe being unsupported in this mode.

2.) Write a utility class that rewrites a completion targeted prompt to a chatgpt compatible one. Basically no text between the assistant opening/closing tags, they'd have to be manually pulled out into the previous user message.

If there's interest from the repository maintainers, I can open a PR for either approach over next weekend, but only if they think its okay with their vision for the project.

sam-cohan commented 1 year ago

Thanks for your input @aabdullah-getguru . I was suggesting 1) but yeah I agree it would be useful for the project maintainers to give thumbs up on whether this is ok with them. If not, would love to know what workarounds they suggest...

krrishdholakia commented 1 year ago

is there an update on this?

RonanKMcGovern commented 1 year ago

I may be far off the mark, but it seems one benefit of guidance is the re-use of keys/values in cache. I don't see how this is possible if using an api (like openai or anthropic). I would have thought this is only possible if running one's own llm... is that correct?

So, I was confused to see gpt-4 and davinci in some of the examples...

If this is correct, perhaps adding a clarifying sentence to the start of the ReadMe would be of benefit.

ADTC commented 1 year ago

AssertionError: When calling OpenAI chat models you must generate only directly inside the assistant role! The OpenAI API does not currently support partial assistant prompting.

This means that you can have only one gen statement in the assistant role. You cannot have a second gen or any other statements. Based on my tests:

{{#assistant}} must be immediately followed by {{gen ... }}. No spaces, new lines or any characters in between.
{{gen ... }} must be followed by {{/assistant}} but you can have any plain text, spaces, new lines between them. However, you cannot have any library function like another gen, a select, if, etc.

I think the assertion also means that you cannot have any gen statement in other roles. But I'm not sure of this.

So the chat completions are limited to simple generations in the assistant role. You cannot do structured generations like the JSON @sam-cohan tried.

This is a HUGE bummer because I was able to successfully do structured JSON generations with the chat model using LongChain + Pydantic. So it should be possible and I was looking forward to replicating it in Guidance.

Why do we have this assertion? assert prompt.endswith("<|im_start|>assistant\n"), "When calling OpenAI chat...

julienma96 commented 1 year ago

@sam-cohan @slundberg

Yeah, I had a look. There are two potential ways this could be done:

Approach 1.) Tweaking the OpenAI class, so you can use gpt3.5-turbo as a drop in replacement for davinci. https://github.com/microsoft/guidance/blob/main/guidance/llms/_openai.py

(Details below)

The main change would be to add a chat_mode called "completion_with_chat".

When this is set, and the user attempts to use a chat-model for simpler completion. Then on the line with the openai call, you instead convert the prompt argument, to a single element array for messages coming in from a user, to simulate conversation history.

You would also drop the logprobs argument which the chat api doesnt accept.

It would only work for the simpler features of guidance, not the ones that need the logprobs. However, you can make the RPG and proverb example work (for instance) using this conversion. We'd probably want to log a warning message about some behaviors maybe being unsupported in this mode.

2.) Write a utility class that rewrites a completion targeted prompt to a chatgpt compatible one. Basically no text between the assistant opening/closing tags, they'd have to be manually pulled out into the previous user message.

If there's interest from the repository maintainers, I can open a PR for either approach over next weekend, but only if they think its okay with their vision for the project.

Could you elaborate more on this, please? I have only access to gpt3.5-turbo and need it to work on the RPG example. AFAIK in chat mode, only one gen statement is allowed inside the assistant role, such that one would can only generate a value (gen statement) per key. In order to fill a whole JSON I would need to add the key to the JSON and place the gen statement where its value should be generated, then after getting that result, update the json and append the next key and gen statement, repeat until done?

hrusli commented 11 months ago

any support or updates for the newest guidance version?

younes-io commented 10 months ago

any updates on this one please ?

guidance-ai / guidance

Basic example broken with chat-type OpenAI models #128