Closed benjamingr closed 1 month ago
Hi! Thanks a lot for the report, finally someone that is using this feature. I will need to find some time to sit down and think about it. The basic idea is to avoid loading the entire file in memory because you don’t know how big that is, maybe I am overthinking it. Nonetheless the library should likely load a large chunk of the file and not one byte at a time
Feel free to think along with me and/or push a PR
Thanks for the quick response!
I think it's fine to buffer the first n
bytes of the file (e.g. 1MB) and fall back on loading it in chunks (of 1MB) at a time if it's larger than that arbitrary threshold.
Here is a draft patch to indicate what I had in mind https://github.com/mangiucugna/json_repair/pull/67 if you think it's a good direction I'm happy to test it better and run all the tooling.
I'm not sure this is the right approach it's probably still much faster to just check if the file is small and if it's under a size where it's problematic from a streaming/memory PoV just read it.
Thanks for the PR, the direction seems correct, I haven’t had the time to review in detail. One thing I didn’t see, and that I think is necessary, is the freeing of the buffer to save memory (say you keep max 2 chunks).
This is because otherwise is exactly the same as loading the entire file in memory, the good news is that the library looks back very little so just keeping 2 should be more than enough.
Version of the library
0.25.3
Describe the bug
Parsing a file with
from_file
is significantly slower than parsing it with.loads
due to theStringFileWrapper
abstraction. Namely, seeking one byte at a time is incredibly slow.cProfile and strace show the time is spent in:
Which makes sense since the library is reading one byte at a time
How to reproduce
With the following JSON file:
Calling:
json_repair.from_file('/tmp/json2.json')
hangs for a long time butjson_repair.loads(open('/tmp/json2.json').read())
finishes instantlyExpected behavior
from_file should be as fast as
.loads(open(file).read())