[Bug]: OutputParserException: Failed to parse pydantic object from guidance program

younes-io commented 11 months ago

Bug Description

@Harsha-Nori:

When I execute the code below taken from the official docs.

from pydantic import BaseModel
from typing import List
from guidance.llms import OpenAI

from llama_index.program import GuidancePydanticProgram

class Song(BaseModel):
    title: str
    length_seconds: int

class Album(BaseModel):
    name: str
    artist: str
    songs: List[Song]

program = GuidancePydanticProgram(
    output_cls=Album,
    prompt_template_str=(
        "Generate an example album, with an artist and a list of songs. Using"
        " the movie {{movie_name}} as inspiration"
    ),
    guidance_llm=OpenAI(
        api_type=str(openai.api_type),
        api_key=str(openai.api_key),
        api_base=str(openai.base_url),
        api_version=str(openai.api_version),
        model='gpt-35-turbo',
        deployment_id=model_name,
    ),
    verbose=True,
)

# print("program.guidance_llm: ", program)
output = program(movie_name="The Shining")

I get this log:

Generate an example album, with an artist and a list of songs. Using the movie {{movie_name}} as inspiration
`` `json
{
  "name": "{{gen 'name' stop='"'}}",
  "artist": "{{gen 'artist' stop='"'}}",
  "songs": [{{#geneach 'songs' stop=']'}}{{#unless @first}}, {{/unless}}{
  "title": "{{gen 'title' stop='"'}}",
  "length_seconds": "{{gen 'length_seconds' stop='"'}}",
}{{/geneach}}],
}
` ``

ParserError: while parsing a flow mapping
  in "<unicode string>", line 1, column 1:
    {
    ^
expected ',' or '}', but got '<scalar>'
  in "<unicode string>", line 2, column 31:
      "name": "{gen 'name' stop='"'}",
                                  ^

During handling of the above exception, another exception occurred:

OutputParserException                     Traceback (most recent call last)
File ~/dev/dir/project/.venv/lib/python3.9/site-packages/llama_index/prompts/guidance_utils.py:149, in parse_pydantic_from_guidance_program(program, cls, verbose)
    148     print(output)
--> 149 json_dict = parse_json_markdown(output)
    150 sub_questions = cls.parse_obj(json_dict)

File ~/dev/dir/project/.venv/lib/python3.9/site-packages/llama_index/output_parsers/utils.py:54, in parse_json_markdown(text)
     53 except yaml.YAMLError as e_yaml:
---> 54     raise OutputParserException(
     55         f"Got invalid JSON object. Error: {e_json} {e_yaml}. "
     56         f"Got JSON string: {json_string}"
     57     )
     58 except NameError as exc:

OutputParserException: Got invalid JSON object. Error: Expecting ',' delimiter: line 2 column 31 (char 32) while parsing a flow mapping
  in "<unicode string>", line 1, column 1:
    {
    ^
expected ',' or '}', but got '<scalar>'
  in "<unicode string>", line 2, column 31:
      "name": "{gen 'name' stop='"'}",
                                  ^. Got JSON string: {
  "name": "{gen 'name' stop='"'}",
  "artist": "{gen 'artist' stop='"'}",
  "songs": [{#geneach 'songs' stop=']'}{#unless @first}, {/unless}{
  "title": "{gen 'title' stop='"'}",
  "length_seconds": "{gen 'length_seconds' stop='"'}",
}{/geneach}],
}

The above exception was the direct cause of the following exception:

OutputParserException                     Traceback (most recent call last)
Cell In[27], line 35
     17 program = GuidancePydanticProgram(
     18     output_cls=Album,
     19     prompt_template_str=(
   (...)
     31     verbose=True,
     32 )
     34 # print("program.guidance_llm: ", program)
---> 35 output = program(movie_name="The Shining")

File ~/dev/dir/project/.venv/lib/python3.9/site-packages/llama_index/program/guidance_program.py:80, in GuidancePydanticProgram.__call__(self, *args, **kwargs)
     73 def __call__(
     74     self,
     75     *args: Any,
     76     **kwargs: Any,
     77 ) -> BaseModel:
     78     executed_program = self._guidance_program(**kwargs)
---> 80     return parse_pydantic_from_guidance_program(
     81         program=executed_program, cls=self._output_cls
     82     )

File ~/dev/dir/project/.venv/lib/python3.9/site-packages/llama_index/prompts/guidance_utils.py:152, in parse_pydantic_from_guidance_program(program, cls, verbose)
    150     sub_questions = cls.parse_obj(json_dict)
    151 except Exception as e:
--> 152     raise OutputParserException(
    153         "Failed to parse pydantic object from guidance program"
    154     ) from e
    155 return sub_questions

OutputParserException: Failed to parse pydantic object from guidance program

Version

Version: 0.9.13

Steps to Reproduce

Execute the code above in a notebook (python 3.9.18)

Relevant Logs/Tracbacks

cf. above

dosubot[bot] commented 11 months ago

🤖

Hey @younes-io, good to see you again! Thanks for presenting this detailed issue. I'm diving into it and will have a full response for you shortly. Hang tight!

younes-io commented 11 months ago

FYI I use

Name: guidance Version: 0.0.64

teis-e commented 9 months ago

Did you mention to fix this?

I got this with both stabilityai/stablelm-zephyr-3b and zephyr 7b

dosubot[bot] commented 9 months ago

🤖

Based on the error message you provided, it seems like the issue is with the JSON object that is being parsed. The error message indicates that there is an issue with the JSON object at line 2, column 31. The error message suggests that a ',' or '}' was expected, but a '' was found instead.

Looking at the JSON object in your error message:

{
  "name": "{gen 'name' stop='"'}",
  "artist": "{gen 'artist' stop='"'}",
  "songs": [{#geneach 'songs' stop=']'}{#unless @first}, {/unless}{
  "title": "{gen 'title' stop='"'}",
  "length_seconds": "{gen 'length_seconds' stop='"'}",
}{/geneach}],
}

It seems like the issue might be with the {gen 'name' stop='"'} and {gen 'artist' stop='"'} parts of the JSON object. The {gen ...} syntax seems to be a placeholder for generating some value, but it's not valid JSON syntax. This could be why the parser is failing.

The parse_json_markdown function in llama_index/output_parsers/utils.py is designed to parse a JSON object embedded within a markdown text block. The expected format of the JSON object is a valid JSON object or array that can be parsed by Python's json.loads function or PyYAML's yaml.safe_load function.

If the JSON object or array cannot be parsed by json.loads, the function tries to parse it with yaml.safe_load, which is less strict and allows for trailing commas. If this also fails, it raises an OutputParserException.

So, the expected format of the JSON object is a valid JSON or YAML syntax that can be parsed by Python's json.loads or PyYAML's yaml.safe_load function. It should be embedded within a markdown text block, surrounded by "json" and "".

You can find more details about this in the llama_index/output_parsers/utils.py file.

To resolve this issue, you might need to ensure that the JSON object being passed to the parse_json_markdown function is a valid JSON object. If the {gen ...} syntax is a placeholder, you might need to replace it with a valid value before passing it to the function.

I hope this helps! If you have any more questions or need further clarification, feel free to ask.

Sources

#### About Dosu

This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.

run-llama / llama_index