First of all, this library's been very helpful to me, thanks for all your time on it!
I ran into a weird issue where as I streamed through some json objects, the process held onto more and more memory. This is only with the python backend.
There, buf is being appended to continuously, but none of the old parsed data is ever discarded. We confirmed this is the issue by adding
lexeme = match.group()
yield discarded + match.start(), lexeme
pos = match.end()
+ buf = buf[pos:]
+ pos = 0
else:
data = f.read(buf_size)
if not data:
^ those two lines. Not a recommended solution, but it did stop the memory from growing. I ended up switching to the yajl2_cffi backend as it's faster anyway, but this tripped me up for a bit!
First of all, this library's been very helpful to me, thanks for all your time on it!
I ran into a weird issue where as I streamed through some json objects, the process held onto more and more memory. This is only with the python backend.
We managed to track the issue down to the Lexer method. https://github.com/isagalaev/ijson/blob/master/ijson/backends/python.py#L25
There,
buf
is being appended to continuously, but none of the old parsed data is ever discarded. We confirmed this is the issue by adding^ those two lines. Not a recommended solution, but it did stop the memory from growing. I ended up switching to the yajl2_cffi backend as it's faster anyway, but this tripped me up for a bit!