mangiucugna / json_repair

A python module to repair invalid JSON, commonly used to parse the output of LLMs
https://pypi.org/project/json-repair/
MIT License
826 stars 48 forks source link

support for number-like strings #16

Closed tmcdonnell87 closed 7 months ago

tmcdonnell87 commented 7 months ago

Issue #


What is the current behavior?

Strings that begin with a number and contain number-like characters (e.g., 10-20, 1.1.1) are not successfully parsed

What is the new behavior?

They are! :-)

Does this introduce a breaking change?

Other information

See tests

mangiucugna commented 7 months ago

Thanks a lot! I will release 0.9.0 now

mezka commented 6 months ago

Can we, instead of just returning the string, have a function like lambda ctx: ctx['str'] that receives the parser context and returns the string, but that by default we can replace from the main json_repairs interface.


def my_custom_value_error_repair(parser_context):
   parser_context.str = 'I COULD HAVE REPAIRED THIS STRING HERE BUT I DIDNT'

repaired_json = json_repair(my_json, on_parse_number_value_error=my_custom_value_error_repair)

That way if my dataset has an specific error that repeats itself I can make a custom formatter to save that error.

Also depending on how much context is passed we can have an LLM try to solve the error for us.

mezka commented 6 months ago

I'm the author of https://github.com/mezka/afipcaeqrdecode, a utility function that deals with extract and decoding government issued metadata on Argentine Invoices that depends on json_repairs

mangiucugna commented 6 months ago

@mezka I am not 100% clear what behavior you would expect exactly. Can you open a new issue and maybe put some examples including a sample json and expected output? Cheers