python backend retains memory while streaming!

catstavi commented 7 years ago

First of all, this library's been very helpful to me, thanks for all your time on it!

I ran into a weird issue where as I streamed through some json objects, the process held onto more and more memory. This is only with the python backend.

We managed to track the issue down to the Lexer method. https://github.com/isagalaev/ijson/blob/master/ijson/backends/python.py#L25

There, buf is being appended to continuously, but none of the old parsed data is ever discarded. We confirmed this is the issue by adding

                    lexeme = match.group()
                yield discarded + match.start(), lexeme
                pos = match.end()
+           buf = buf[pos:]
+           pos = 0
        else:
            data = f.read(buf_size)
            if not data:

^ those two lines. Not a recommended solution, but it did stop the memory from growing. I ended up switching to the yajl2_cffi backend as it's faster anyway, but this tripped me up for a bit!

franklingu commented 7 years ago

what is the status of this issue? Thanks

isagalaev commented 4 years ago

The project has moved over to a new maintainer: https://github.com/rtobar/ijson. Please reopen this issue there if it's still relevant.

isagalaev / ijson

python backend retains memory while streaming! #60