mangiucugna / json_repair

A python module to repair invalid JSON, commonly used to parse the output of LLMs
https://pypi.org/project/json-repair/
MIT License
746 stars 42 forks source link

Not extracting specific JSON examples #34

Closed ArslanKAS closed 4 months ago

ArslanKAS commented 4 months ago

Describe the bug The library works great for simple JSON response that comes from LLMs but for responses it completely misses them.

The results that this package returned to me { \"Summary"

Expected return

{
    "Summary": "The customer, Joey, contacted Avanser to inquire about a specific vehicle model. He was interested in purchasing a silver-colored car and wanted to know if it was available. The agent checked the inventory and found that they had a similar model with a purple color. After discussing the availability of the silver model, the agent offered to allocate one of their salespeople to call Joey back to discuss further. The customer expressed his satisfaction with the service and requested doorstep delivery nationwide.",
    "Brand": "JD",
    "Model": "Silver One on the Back One" (Note: This is not a real vehicle model, but rather a description provided by the customer),
    "Primary topic": "Vehicle Availability",
    "Primary topic explanation": "The customer wanted to know if a specific silver-colored car was available for purchase.",
    "Secondary topic": "Trade-in Options and Financing",
    "Secondary topic explanation": "The customer mentioned that he had cash and was interested in trading in his current vehicle, but the agent clarified that they did not have the necessary information on hand.",
    "Issue resolution": "Partially resolved",
    "Issue resolution explanation": "The agent checked the inventory and found a similar model with a purple color, but could not confirm the availability of the silver model. The customer was offered an alternative solution to discuss further with one of their salespeople."
}

Screenshots image

Environment (please complete the following information):

mangiucugna commented 4 months ago

Hi, I am about to release a fix for an issue that I think affects this bug as well

mangiucugna commented 4 months ago

0.15.6 is out, can you try it please?

ArslanKAS commented 4 months ago

0.15.6 is out, can you try it please?

Awesome. So quick update. I'm getting better results now.

{'Summary': 'The customer, Joey, contacted Avanser to inquire about a specific vehicle model. He was interested in purchasing a silver-colored car and wanted to know if it was available. The agent checked the inventory and found that they had a similar model with a purple color. After discussing the availability of the silver model, the agent offered to allocate one of their salespeople to call Joey back to discuss further. The customer expressed his satisfaction with the service and requested doorstep delivery nationwide.',
 'Brand': 'JD',
 'Model': 'Silver One on the Back One',
 'Note': 'This is not a real vehicle model',
 'Primary topic': 'Vehicle Availability',
 'Primary topic explanation': 'The customer wanted to know if a specific silver-colored car was available for purchase.',
 'Secondary topic': 'Trade-in Options and Financing',
 'Secondary topic explanation': 'The customer mentioned that he had cash and was interested in trading in his current vehicle, but the agent clarified that they did not have the necessary information on hand.',
 'Issue resolution': 'Partially resolved',
 'Issue resolution explanation': 'The agent checked the inventory and found a similar model with a purple color, but could not confirm the availability of the silver model. The customer was offered an alternative solution to discuss further with one of their salespeople.'}

I'll keep you informed if I get stuck on some weird LLM JSON response.

ArslanKAS commented 4 months ago

Hi @mangiucugna could you please look at the following example. I got some responses in the following format by OpenAI

""" ``` {json} { "Summary": "The call involved a customer following up on a request regarding account extensions and splits. The agent provided updates and escalated the issue for further investigation.", "Primary topic": "Account extensions and splits", "Primary topic explanation": "Customer inquired about extending account terms and splitting an account.", "Secondary topic": "Escalation and investigation", "Secondary topic explanation": "Agent escalated the issue for further investigation and provided updates to the customer.", "Issue resolution": "Partially resolved", "Issue resolution explanation": "The agent raised the issue for investigation and provided the customer with a reference number for tracking. The resolution is pending further updates from the team." } ```

"""

The error is:



During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
Cell In[5], line 15
      1 output = """
      2 ```{json}
      3 {
   (...)
     12 ```
     13 """
---> 15 repair_json(output)

File ~/anaconda3/envs/python3/lib/python3.10/site-packages/json_repair/json_repair.py:464, in repair_json(json_str, return_objects, skip_json_loads, logging)
    462         parsed_json = json.loads(json_str)
    463     except json.JSONDecodeError:
--> 464         parsed_json = parser.parse()
    465 # It's useful to return the actual object instead of the json string, it allows this lib to be a replacement of the json library
    466 if return_objects or logging:

File ~/anaconda3/envs/python3/lib/python3.10/site-packages/json_repair/json_repair.py:46, in JSONParser.parse(self)
     44 def parse(self) -> Union[Dict[str, Any], List[Any], str, float, int, bool, None]:
     45     if self.logger["log_level"] == "none":
---> 46         return self.parse_json()
     47     else:
     48         return self.parse_json(), self.logger["log"]

File ~/anaconda3/envs/python3/lib/python3.10/site-packages/json_repair/json_repair.py:60, in JSONParser.parse_json(self)
     58 elif char == "{":
     59     self.index += 1
---> 60     return self.parse_object()
     61 # <array> starts with '['
     62 elif char == "[":

File ~/anaconda3/envs/python3/lib/python3.10/site-packages/json_repair/json_repair.py:154, in JSONParser.parse_object(self)
    152 # Reset context since our job is done
    153 self.reset_context()
--> 154 obj[key] = value
    156 if (self.get_char_at() or "") in [",", "'", '"']:
    157     self.index += 1

TypeError: unhashable type: 'dict' ```
mangiucugna commented 4 months ago

hi @ArslanKAS are you using latest (0.16.3)? I can't reproduce. in 0.16 I added better support for strange character at the beginning of the string and I bet this is the cause.

ArslanKAS commented 4 months ago

This is amazing @mangiucugna Sorry I was using the 0.15.6. Just updated and it works like magic. Thank you so much.

ArslanKAS commented 4 months ago

Hi again @mangiucugna

I'm using the updated version of your library and I've run into an odd issue.

Phi-3 mini from Microsoft generates the following output for my prompt:

"""
{

    "Summary": "The customer inquired about the availability of a specific vehicle model and its pricing. The agent from Avanser provided information on their wide selection, transparent pricing, and test drive options. They also discussed financing solutions and confirmed that the desired vehicle was available for purchase.",

    "Brand": "Avanser",
    "Model": "Corolla",
ran a typo in 'model' name, assuming it should be 'Civic',
    "Primary topic": "Vehicle Availability and Pricing",
    "Primary topic explanation": "The customer wanted to know if the specific vehicle model was available and its price.",
    "Secondary topic": "Test Drive Options and Financing Solutions",
    "Secondary topic explanation": "The agent discussed test drive options, financing solutions, and confirmed availability of the desired vehicle.",
    "Issue resolution": "Resolved",
    "Issue resolution explanation": "The customer's inquiry about the vehicle model was addressed by confirming its availability and discussing pricing and additional services."

}

Correction: The 'Model' field should be corrected to 'Civic'. However, since this is a hypothetical scenario, I will maintain the original typo for illustrative purposes. If an actual correction were needed, it would look like this:

"...",
    "Model": "Corolla",
    "Model": "Civic",
}

Note: In real-world applications, such corrections should be made to ensure data accuracy and integrity.
"""

When using the json_repair it converts it into the following:

'{"Summary": "The customer inquired about the availability of a specific vehicle model and its pricing. The agent from Avanser provided information on their wide selection, transparent pricing, and test drive options. They also discussed financing solutions and confirmed that the desired vehicle was available for purchase.", "Brand": "Avanser", "Model": "Corolla", "opic": "Vehicle Availability and Pricing", "Primary topic explanation": "The customer wanted to know if the specific vehicle model was available and its price.", "Secondary topic": "Test Drive Options and Financing Solutions", "Secondary topic explanation": "The agent discussed test drive options, financing solutions, and confirmed availability of the desired vehicle.", "Issue resolution": "Resolved", "Issue resolution explanation": "The customer\'s inquiry about the vehicle model was addressed by confirming its availability and discussing pricing and additional services."}'

The issues:

mangiucugna commented 4 months ago

Thanks, good catch. I found the issue, it will spit some garbage key due to that comment in the middle but the object will be consistent if you ignore the weird key.

{"Summary": "The customer inquired about the availability of a specific vehicle model and its pricing. The agent from Avanser provided information on their wide selection, transparent pricing, and test drive options. They also discussed financing solutions and confirmed that the desired vehicle was available for purchase.", "Brand": "Avanser", "Model": "Corolla", "model' name, assuming it should be": "ivic'", "Primary topic": "Vehicle Availability and Pricing", "Primary topic explanation": "The customer wanted to know if the specific vehicle model was available and its price.", "Secondary topic": "Test Drive Options and Financing Solutions", "Secondary topic explanation": "The agent discussed test drive options, financing solutions, and confirmed availability of the desired vehicle.", "Issue resolution": "Resolved", "Issue resolution explanation": "The customer's inquiry about the vehicle model was addressed by confirming its availability and discussing pricing and additional services."}

ArslanKAS commented 4 months ago

Sorry to bother you again Mr. @mangiucugna . This time I was using Mistral and it gave me this response:

{
"Call_Analysis": {
"Professionalism": {
"Result": "Yes",
"Explanation": "The agent maintained a formal and courteous tone throughout the call, addressing the customer respectfully."
},
"Empathy": {
"Result": "Yes",
"Explanation": "The agent acknowledged the customer's concerns and expressed understanding and compassion towards their situation."
},
"Friendliness": {
"Result": "Yes",
"Explanation": "The agent used a friendly and approachable tone, making the customer feel at ease and comfortable during the call."
},
"Understanding": {
"Result": "Yes",
"Explanation": "The agent listened attentively to the customer's issue and asked clarifying questions to ensure they fully understood the situation."
}
}
}

But the json_repair turned it into this:

 {
"Call\_Analysis": {
"Professionalism": {
"Result": "Yes",
"Explanation": "The agent maintained a professional tone and demeanor throughout the call, addressing the customer in a respectful and courteous manner."
},
"Empathy": {
"Result": "Yes",
"Explanation": "The agent expressed empathy towards the customer's situation by acknowledging their concerns and offering reassurances and solutions."
},
"Friendliness": {
"Result": "Yes",
"Explanation": "The agent was friendly and approachable, using a warm and welcoming tone to engage with the customer and build rapport."
},
"Understanding": {
"Result": "Yes",
"Explanation": "The agent demonstrated a clear understanding of the customer's issue by actively listening, asking clarifying questions, and providing accurate information."
}
}
}

The only difference is the backslash "\" it added to "Call_Analysis" and turned it into "Call_Analysis"

mangiucugna commented 4 months ago

Hi, I just tested the input with latest (0.18.0) and I got:

{"Call_Analysis": {"Professionalism": {"Result": "Yes", "Explanation": "The agent maintained a formal and courteous tone throughout the call, addressing the customer respectfully."}, "Empathy": {"Result": "Yes", "Explanation": "The agent acknowledged the customer's concerns and expressed understanding and compassion towards their situation."}, "Friendliness": {"Result": "Yes", "Explanation": "The agent used a friendly and approachable tone, making the customer feel at ease and comfortable during the call."}, "Understanding": {"Result": "Yes", "Explanation": "The agent listened attentively to the customer's issue and asked clarifying questions to ensure they fully understood the situation."}}}

Not sure why you got a different output tbh