Open tybalex opened 4 months ago
Another example: Input:
{"name": "run", "args": {"param1": ["This is C(2)", This is F(3)]}}
Output:
{"name": "run", "args": {"param1": ["This is C(2)", 3]}}
Thanks for your input.
Just curious: did you encounter this broken JSON in a real world example, or did you make it up?
The limitation originates in the code that identifies the end of the string when the end quote is missing. It currently stops at the first next delimiter, including (
and )
. That is needed to identify MongoDB data types and JSONP notation. To prevent for example This is F(3)
to be identified as a MongoDB/JSONP function and replaced with 3
, we should refine the logic, for example by checking abcense of spaces in the name, and/or checking against a list with known MongoDB data types.
Via 58fe64ce62d7a3840f427df2dabb1fa748e540de I've made the detection of MongoDB/JSONP function calls more robust. This solves the issue of jsonrepair
silently changing an unquoted string containing parenthesis, the library now consistenly throws an exception.
The fix is not yet published.
Repairing an unquoted string containing parenthesis would be a next step.
Thanks for your input.
Just curious: did you encounter this broken JSON in a real world example, or did you make it up?
The limitation originates in the code that identifies the end of the string when the end quote is missing. It currently stops at the first next delimiter, including
(
and)
. That is needed to identify MongoDB data types and JSONP notation. To prevent for exampleThis is F(3)
to be identified as a MongoDB/JSONP function and replaced with3
, we should refine the logic, for example by checking abcense of spaces in the name, and/or checking against a list with known MongoDB data types.
Hi @josdejong , thank you for the quick response! I really like this tool.
To answer your question: This is a real world example -- it is produced by an LLM(large language model) we trained, and I was trying to use jsonrepair
to fix some of the broken json produced by the LLM.
Thanks, good to know.
The Problem When I tried to use the library to fix the following string, it failed: Input:
Output:
======================== However without the '(' or ')' char, it can produce a correct fix: Input:
Output:
Is this behavior expected or it is a bug?