mangiucugna / json_repair

A python module to repair invalid JSON, commonly used to parse the output of LLMs
https://pypi.org/project/json-repair/
MIT License
826 stars 48 forks source link

[Bug]: Quotes are not added after the last value in a JSON when the value has a Comma #73

Closed strentom closed 3 weeks ago

strentom commented 3 weeks ago

Version of the library

0.29.5

Describe the bug

A quote is not added after the last value in a JSON when there is a comma (,) in the value.

How to reproduce

This: json_repair.loads("""{\n"response": "Hello Looks like .. , but not ...}\n""") is parsed into: {'response': 'Hello Looks like ..'}, truncating everything after the comma.

When you remove the comma, the JSON is parsed correctly and a quote (") is added before the }.

Expected behavior

This: json_repair.loads("""{\n"response": "Hello Looks like .. , but not ...}\n""") Results in this correct parsed dictionary: {'response': 'Hello Looks like .. , but not ...'}

mangiucugna commented 3 weeks ago

Hi! This type of corner case is always a bit hard to handle. I found a workaround but it will work only if it's the last element of the object, otherwise the behavior will be the same as you reported. This is because is virtually impossible to understand if the LLM is writing gibberish or not if it's in the middle of an object