Closed nikolaysm closed 6 months ago
Hi @nikolaysm can you open a new issue for that? The issue is with the fact that the right format is
<h3 id='title'>
so I am not 100% sure I can support this use case but is definitely a distinct use case from this issue
_Originally posted by @mangiucugna in https://github.com/mangiucugna/json_repair/issues/20#issuecomment-2066367077_
After removing the attribute id="title", I still experience the same issue. Any suggestions on how to fix the issue with "Passie voor techniek"
within the value?
json_str = '{\n"html": "<h3 >Waarom meer dan 200 Technical Experts - "Passie voor techniek"?</h3>"}'
data = repair_json(json_str, return_objects=True)
so this is an entire problem, my idea would be to do something dirty here and use regex (I know I know) to consider anything inside html tags as one string, and then replace the offending quoting characters
We might consider expanding the logic to detect the closing quote.
A similar approach is discussed here: https://github.com/josdejong/jsonrepair/pull/116
good point, let me try something although I am not sure how it would interact with all the other use cases but is worth a try
actually, I noticed that the lib doing that already :/ just that is limited to one use case because I wanted to be safe
Thanks for pointing me in the right direction, I am releasing 0.14.0 with this fix
Top work, @mangiucugna! Thanks for the quick fix!
Hi @mangiucugna,
Thank you for your efforts on this. I've encountered a similar issue with the output from the LLM. It seems that the
repair_json
function isn't handling certain cases correctly.For instance, when trying to repair the following JSON string:
The current output is:
However, the expected output should be:
It seems like the function is having trouble handling certain characters or nested structures properly. Would you mind looking into this further?
Thank you again for your attention to this matter.
_Originally posted by @nikolaysm in https://github.com/mangiucugna/json_repair/issues/20#issuecomment-2066249721_