Closed tcaminel-pro closed 1 year ago
Glad you like it...
This is funny indeed...
I tested the Code you provided and it seems that GPT is trying to fix the typo (synomym-> synomym), thus is it won't parse
For me, it failed on the first example already...
My advice would be to enable debug console logging, so that you would see what is going on there... although if you have, you might have missed it... I actually couldn't see it (n and m are so similar) and was baffled a while too, since I saw that the field was generated :)
The easiest way to enable verbose mode is to set up an env: "LANGCHAIN_DECORATORS_VERBOSE": "true", (I can't believe its not documented here)
Alternatively, you can also try PromptWatch which is natively supported for tracing all details.
Hi, can we close this? Did fixing the typo work for you?
Yes, it works! Thanks for the help. I've added "use a spell checker" as a recommandation for prompt engineering for my team...
Hi,
Not a bug but a suggestion: Sometimes, it's the LLM that creates typos in JSON keys. I saw that with LLama-2. And sometimes it 'corrects' the key name, as saw before.
To handle these cases, I've hacked 'align_fields_with_model' with a fuzzy match:
from fuzzywuzzy import process
def align_fields_with_model(data: dict, model: Type[BaseModel]) -> dict:
....
elif field_info.field_info.alias.lower() in data:
value = data[field_info.field_info.alias.lower()]
else:
value = correct_typo_in_key(field, data) # <==== Hack added by TC
if not data_with_compressed_keys:
....
def correct_typo_in_key(field: str, data: dict):
"""Try to correct an incorrect key returned by the LLM by using a fuzzy match with expected schema"""
spurious_key, score = process.extractOne(field, data.keys())
return data[spurious_key] if score >= 80 else None
Hey, I love that... Feel free to open PR, I'd accept this
My code base as quite diverged from yours so opening a PR is not convenient.
BTW, I have found a nice way to find a JSON object anywhere in the LLM answer, using recursive regex. Here is my code:
import regex
def json_finder(text: str) -> str:
text = text.strip() + "}" # add an extra } in case it's missing
pattern = regex.compile(r"\{(?:[^{}]|(?R))*\}") # recursive regexp
r = pattern.findall(text)
if count := len(r) != 1:
raise OutputParserException( f"No or multiple JSON found ({count})", llm_output=text )
return r[0]
Why is it better than the native JsonOutputParser?
All you need to do is annotate the output with ->dict return type, it will be automatically resolved
But you can use any outputparser you wish... Either from LangChain, or to build your own (just follow standard LangChain practice)
I use that code in the PydanticOutputParser in replacement of that simpler regexp.
regex_pattern = r"\[.*\]" if self.as_list else r"\{.*\}"
match = re.search(regex_pattern, text.strip(),re.MULTILINE | re.IGNORECASE | re.DOTALL)
...
The ability to find the JSON anywhere in the LLM answer is useful - typically with LLAMA-2 that has the bad habit to "explain" his outcome with bla bla.
First, congrat for that library. I like it. However, I've found a very strange behaviour.
When running the code hereafter, the first promp works, but the second one raise an error
The error seems related to the key name 'synomyms'. Any other name seems OK.