langchain-ai / langchain

šŸ¦œšŸ”— Build context-aware reasoning applications
https://python.langchain.com
MIT License
92.48k stars 14.8k forks source link

FewShotPromptTemplate example formating bug #8433

Closed anyangml closed 10 months ago

anyangml commented 1 year ago

System Info

Context

I am trying build a prompt that convert latex string generated by an OCR algo to a text describing that latex. When using the FewShotPromptTemplate, the curly brackets in the latex string are somehow interpreted as key to a dict.

Who can help?

@hwchase17

Information

Related Components

Reproduction

Code

from langchain.prompts.few_shot import FewShotPromptTemplate
from langchain.prompts.prompt import PromptTemplate

examples = [
  {
    "latex": """\sum_{i=1}^{n}""",
    "doc": """taking sum from 1 to n"""
  }
]

example_template = """
latex: {latex}
doc: {doc}
"""
prefix = """ Convert the latex
"""

suffix = """
User: {latex}
AI: """

example_prompt = PromptTemplate(input_variables=["latex", "doc"], template="Question: {latex}\n{doc}")

few_shot_prompt_template = FewShotPromptTemplate(
    examples=examples,
    example_prompt=example_prompt,
    prefix=prefix,
    suffix=suffix,
    input_variables=["latex"],
    example_separator="\n\n"
)

print(example_prompt.format(**examples[0]))
print(few_shot_prompt_template.format(latex="\frac{a}{b}"))

Expected behavior

Error

The PromptTemplate.format works fine, but the FewShotPromptTemplate fails.

---> 34 print(few_shot_prompt_template.format(latex="\frac{a}{b}"))

File [~/Library/Caches/pypoetry/virtualenvs/expression-engine-OXFJOYa8-py3.8/lib/python3.8/site-packages/langchain/prompts/few_shot.py:123](https://file+.vscode-resource.vscode-cdn.net/Users/LLM/~/Library/Caches/pypoetry/virtualenvs/expression-engine-OXFJOYa8-py3.8/lib/python3.8/site-packages/langchain/prompts/few_shot.py:123), in FewShotPromptTemplate.format(self, **kwargs)
    120 template = self.example_separator.join([piece for piece in pieces if piece])
    122 # Format the template with the input variables.
--> 123 return DEFAULT_FORMATTER_MAPPING[self.template_format](template, **kwargs)

File [/usr/local/opt/python](https://file+.vscode-resource.vscode-cdn.net/usr/local/opt/python)@3.8/Frameworks/Python.framework/Versions/3.8/lib/python3.8/string.py:163, in Formatter.format(self, format_string, *args, **kwargs)
    162 def format(self, format_string, [/](https://file+.vscode-resource.vscode-cdn.net/), *args, **kwargs):
--> 163     return self.vformat(format_string, args, kwargs)

File [~/Library/Caches/pypoetry/virtualenvs/expression-engine-OXFJOYa8-py3.8/lib/python3.8/site-packages/langchain/utils/formatting.py:29](https://file+.vscode-resource.vscode-cdn.net/Users/LLM/~/Library/Caches/pypoetry/virtualenvs/expression-engine-OXFJOYa8-py3.8/lib/python3.8/site-packages/langchain/utils/formatting.py:29), in StrictFormatter.vformat(self, format_string, args, kwargs)
     24 if len(args) > 0:
     25     raise ValueError(
     26         "No arguments should be provided, "
...
    227     return args[key]
    228 else:
--> 229     return kwargs[key]

KeyError: 'i=1'
dosubot[bot] commented 1 year ago

Answer generated by a šŸ¤–

Answer

I understand that you're encountering an issue with the FewShotPromptTemplate in LangChain. Specifically, the curly brackets in your LaTeX string are being interpreted as keys to a dictionary, which is causing a KeyError.

The problem arises from Python's built-in string formatting mechanism. When you use curly braces {} in a string that you're formatting with the format method, Python interprets anything inside the braces as a variable to be replaced. In your LaTeX string \sum_{i=1}^{n}, Python is interpreting i=1 and n as variables, which are not defined in your code.

One potential solution to this issue is to escape the curly braces in your LaTeX string by doubling them. Here's how you can modify your code:

examples = [
  {
    "latex": """\\sum_{{i=1}}^{{n}}""",
    "doc": """taking sum from 1 to n"""
  }
]

By doubling the curly braces, you're instructing Python to treat them as literal characters in the string, rather than as placeholders for variables.

Please note that this is one potential solution, and there might be other ways to address this issue. If you believe this is a bug that could impact other users, you're encouraged to make a pull request with a proposed fix.

If this solution doesn't align with your experience, or if the issue persists, I recommend upgrading to the latest version of LangChain to ensure you have all the recent updates and bug fixes.

Sources


This response is meant to be useful, save you time, and share context. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a šŸ‘ if this is helpful and šŸ‘Ž if it is irrelevant.

dosubot[bot] commented 10 months ago

Hi, @anyangml! I'm Dosu, and I'm helping the LangChain team manage our backlog. I wanted to let you know that we are marking this issue as stale.

Based on the information provided, it seems that you reported an issue with the Langchain library's FewShotPromptTemplate not correctly interpreting curly brackets in a LaTeX string. You received a response from me explaining that the issue arises from Python's string formatting mechanism and suggesting a potential solution of escaping the curly braces by doubling them. My response also recommends upgrading to the latest version of LangChain and making a pull request if the issue persists.

Before we close this issue, we wanted to check with you if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself or it will be automatically closed in 7 days.

Thank you for your contribution to the LangChain repository!