[Bug]: LLMTextCompletionProgram fails with KeyError when prompt contains JSON example data

kkarski commented 4 months ago

Bug Description

In order to help the LLM output valid JSON objects, it's often helpful to provide a few JSON examples in addition to the Pydantic schema.

The issue is, if any JSON is present in the prompt_template_str, the first {'property' ... } of the JSON example will be interpreted as a {variable} for the str.format() function when the message is being prepared before invoking the LLM. Since it's not actually a variable to be replaced, none can be provided and the str.format() functions fails with a KeyError.

Version

llama-index==0.10.15

Steps to Reproduce

Place any JSON content in the prompt_template_str before invoking LLMTextCompletionProgram.from_defaults()

Relevant Logs/Tracbacks

No response

dosubot[bot] commented 4 months ago

Hello @kkarski! 👋 I'm Dosu, a bot here to assist you with bug-fixing, answering questions, and guiding you through contributions while we wait for a human maintainer. I'm diving into your issue regarding the LLMTextCompletionProgram and the JSON data in prompt_template_str. Expect a full answer shortly!

logan-markewich commented 4 months ago

This is very hard to fix with python string formatting 😓

I suggest escaping your JSON with double brackets

dosubot[bot] commented 4 months ago

To address the KeyError issue you're encountering with LLMTextCompletionProgram when your prompt includes JSON data, a practical solution involves using the convert_to_handlebars function available in the LlamaIndex codebase. This function is specifically designed to convert Python format strings, which utilize single braces {} for variable substitution and double braces {{}} for escaping, into handlebars-style templates. In handlebars templates, double braces {{}} are used for variable substitution, making them suitable for JSON data.

Here's a concise guide on how to use the convert_to_handlebars function:

Start with your JSON data as a Python string, ensuring dynamic values are represented using Python's string formatting syntax (i.e., {variable_name}).
Apply the convert_to_handlebars function to transform your Python format string into a handlebars-style template. This conversion process automatically handles the escaping of JSON data, making it compatible with the LLMTextCompletionProgram.
You can now safely use the converted string as a prompt in LLMTextCompletionProgram without triggering the KeyError due to JSON data.

For example:

json_data_with_python_format = '{"name": "{user_name}", "age": {user_age}}'
handlebars_template = convert_to_handlebars(json_data_with_python_format)

This approach, leveraging the convert_to_handlebars function, effectively circumvents the issue of escaping JSON data within Python string formatting, facilitating the integration of JSON data into prompts for the LLMTextCompletionProgram.

The convert_to_handlebars function works by first replacing double braces with a temporary placeholder, then single braces with double braces, and finally, the temporary placeholder with single braces. This ensures that your JSON data is correctly formatted for use in the prompt without encountering the KeyError.

You can find the convert_to_handlebars function in the llama-index-core/llama_index/core/prompts/guidance_utils.py file.

As for updates or newer versions of LlamaIndex that might address this issue directly, I couldn't find specific information in the repository. If you need further assistance or have additional questions, feel free to ask.

Sources

llama-index-core/llama_index/core/prompts/guidance_utils.py

#### About Dosu

This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.

run-llama / llama_index