mangiucugna / json_repair

A python module to repair invalid JSON, commonly used to parse the output of LLMs
https://pypi.org/project/json-repair/
MIT License
826 stars 48 forks source link

Not working for basic example #33

Closed dcollien closed 5 months ago

dcollien commented 5 months ago

Describe the bug The following "broken" json:

[
    {
        "foo": "Foo bar baz",
        "tag": "#foo-bar-baz"
    },
    {
        "foo": "foo bar "foobar" foo bar baz.",
        "tag": "#foo-bar-foobar"
    }
]

is repaired well by: https://josdejong.github.io/jsonrepair/

but not by this library.

To Reproduce

>>> bad_json
'[\n    {\n        "foo": "Foo bar baz",\n        "tag": "#foo-bar-baz"\n    },\n    {\n        "foo": "foo bar "foobar" foo bar baz.",\n        "tag": "#foo-bar-foobar"\n    }\n]'
>>> json_repair.loads(bad_json)
[{'foo': 'Foo bar baz', 'tag': '#foo-bar-baz"\n    },\n    {\n        "foo', 'foo bar "foobar" foo bar baz.': 'tag', '#foo-bar-foobar': ''}]

Expected behavior Expected output:

[
    {
        "foo": "Foo bar baz",
        "tag": "#foo-bar-baz"
    },
    {
        "foo": "foo bar \"foobar\" foo bar baz.",
        "tag": "#foo-bar-foobar"
    }
]

(as per https://josdejong.github.io/jsonrepair/)

output instead:

[{'foo': 'Foo bar baz', 'tag': '#foo-bar-baz"\n    },\n    {\n        "foo', 'foo bar "foobar" foo bar baz.': 'tag', '#foo-bar-foobar': ''}]
mangiucugna commented 5 months ago

super interesting thanks for reporting, somehow the whitespaces are messing with the library. I will take a look

mangiucugna commented 5 months ago

0.15.6 is out, can you try it please? This example now was added to the tests and they are all green

dcollien commented 5 months ago

Looks great, thank you!