mangiucugna / json_repair

A python module to repair invalid JSON, commonly used to parse the output of LLMs
https://pypi.org/project/json-repair/
MIT License
826 stars 48 forks source link

Problem with missing Value or comment in Key-Value pair #7

Closed ElvisUntot closed 11 months ago

ElvisUntot commented 11 months ago

Describe the bug When there is a comment in the json or a value is missing, the tool creates a new k-v pair.

To Reproduce { "value_1": true, SHOULD_NOT_EXIST "value_2": "data" }

TRANSFORMS TO

{ "value_1": true, "SHOULD_NOT_EXIST\n\n": "alue_2", "": "data", "}": "" }

AND

{ "value_1": "value_2": "data" }

TRANSFORMS TO

{ "value_1": "value_2", "": "data", "}": "" }

Expected behavior Those are the 2 expected results:

{ "value_1": true, "value_2": "data" }

{ "value_1": "" "value_2": "data" }

Desktop (please complete the following information):

Additional context The Json files were created with LLama2 and Mistral.

mangiucugna commented 11 months ago

Hi, thanks for the report.

I was testing the two string and for the second I don't get the same output but: {"value_1": "value_2", "": "data"}

That is imo fair enough since both {"value_1": "value_2", "": "data"} and {"value_1": "", "value_2": "data"} are possible fixes and the parser is designed in a way that it fixes the leftmost token first.

The first value is definitely problematic though

mangiucugna commented 11 months ago

ok so I have a fix for this that will pass the following test cases:

{"value_1": true, SHOULD_NOT_EXIST "value_2": "data" AAAA } ==> {'value_1': True, 'value_2': 'data'}

and

{"value_1": "value_2": "data"} ==> {"value_1": "value_2", "data": ""}

Will release as 0.3.0

ElvisUntot commented 11 months ago

Wow, thanks!