Working with LLMs (Llama) and having it produce some output in JSON format. There is an edge case I have encountered when working with chinese headings where it will often produce double quotes on the "title" property in the JSON string. This breaks the formatting.
Using the json_repair library should fix this, but instead it returns an empty string in the title.
Version of the library
0.25.2
Describe the bug
Working with LLMs (Llama) and having it produce some output in JSON format. There is an edge case I have encountered when working with chinese headings where it will often produce double quotes on the "title" property in the JSON string. This breaks the formatting.
Using the json_repair library should fix this, but instead it returns an empty string in the title.
Output: [{"chapter_id": 1, "starting_time_stamp": "0:00:00", "title": ""}, {"chapter_id": 2, "starting_time_stamp": "0:01:00", "title": ""}, {"chapter_id": 3, "starting_time_stamp": "0:02:00", "title": ""}, {"chapter_id": 4, "starting_time_stamp": "0:04:00", "title": ""}, {"chapter_id": 5, "starting_time_stamp": "0:06:00", "title": ""}, {"chapter_id": 6, "starting_time_stamp": "0:09:00", "title": ""}, {"chapter_id": 7, "starting_time_stamp": "0:11:00", "title": ""}]
How to reproduce
Use the following JSON.
Calling code:
Expected behavior
Expected the removal of one the quotes in the starting of the "title" object string.