mangiucugna / json_repair

A python module to repair invalid JSON, commonly used to parse the output of LLMs
https://pypi.org/project/json-repair/
MIT License
826 stars 48 forks source link

Missing left quotes in numbers are not parsed properly #23

Closed AloXado320 closed 6 months ago

AloXado320 commented 6 months ago

Describe the bug If there's a missing out quote at the left of a json define, it doesn't repair properly

To Reproduce Steps to reproduce the behavior: Run this json file with json_repair

{
  "words": abcdef",
  "numbers": 12345",
  "words2": ghijkl"
}

Gets parsed like this

{'words': 'abcdef', 'numbers': 12345, ',\n ': 'ords2', 'ghijkl': ''}

Expected behavior Proper output: {'words': 'abcdef', 'numbers': '12345', 'words2': 'ghijkl'}

mangiucugna commented 6 months ago

Hi! Thank for reporting the bug, one note about numbers in canonical JSON though. The expected output here is:

{ "words": "abcdef", "numbers": 12345, "words2": "ghijkl" }

So the library should fix this by removing the dangling " and not eating the following key