Closed rtobar closed 5 years ago
Hi Ivan,
Any chance to get this work reviewed and integrated?
Any chance to get this work reviewed and integrated?
Yes! Looks like great stuff. I'll look at it once I get around ijson again. Being ve-e-ery busy at the moment :-)
Hi Rodrigo! I have finally found some time to review this pull request. It is a great addition and I'm looking forward to committing it after we iron out a few things I mentioned above. Thanks a lot!
Hi Ivan,
Thanks to you for taking the time to review these changes. I'm currently on holiday, and won't be back to the office until the last week of January, so I might or not be able to do something before then.
I responded to your observations above, which all seem very reasonable. I'm expecting small feedback only on one small point, the rest I'll implement as requested.
Thanks again!
Thanks for a quick reply! I'll try to be more responsive as well :-) Take your holiday time!
Hi, I have a need to iteratively parse very large JSON objects and I've decided to try this branch out for more speed. It works fine with Py2.7, but with Py3.6, I get a segfault. Here's a backtrace
(gdb) bt
#0 0x00007fbaf1320635 in ?? () from /usr/lib/libpython3.6m.so.1.0
#1 0x00007fbaf13b579e in PyArg_ParseTupleAndKeywords () from /usr/lib/libpython3.6m.so.1.0
#2 0x00007fbaeff881b3 in basicparse_init (self=0x7fbaf0476e30, args=<optimized out>, kwargs=<optimized out>) at ijson/backends/_yajl2.c:233
#3 0x00007fbaeff87cdf in parsegen_init (self=0x7fbaf0476e30, args=<optimized out>, kwargs=<optimized out>) at ijson/backends/_yajl2.c:447
#4 0x00007fbaeff87c09 in itemsgen_init (self=0x7fbaf0476e30, args=<optimized out>, kwargs=0x0) at ijson/backends/_yajl2.c:760
#5 0x00007fbaf14432ce in ?? () from /usr/lib/libpython3.6m.so.1.0
#6 0x00007fbaf145d85c in _PyObject_FastCallDict () from /usr/lib/libpython3.6m.so.1.0
#7 0x00007fbaf13e4396 in ?? () from /usr/lib/libpython3.6m.so.1.0
#8 0x00007fbaf13a5067 in _PyEval_EvalFrameDefault () from /usr/lib/libpython3.6m.so.1.0
#9 0x00007fbaf13e3d4a in ?? () from /usr/lib/libpython3.6m.so.1.0
#10 0x00007fbaf13e4303 in ?? () from /usr/lib/libpython3.6m.so.1.0
#11 0x00007fbaf13a5067 in _PyEval_EvalFrameDefault () from /usr/lib/libpython3.6m.so.1.0
#12 0x00007fbaf13e4757 in PyEval_EvalCodeEx () from /usr/lib/libpython3.6m.so.1.0
#13 0x00007fbaf13a4d4b in PyEval_EvalCode () from /usr/lib/libpython3.6m.so.1.0
#14 0x00007fbaf1486112 in ?? () from /usr/lib/libpython3.6m.so.1.0
#15 0x00007fbaf148897d in PyRun_FileExFlags () from /usr/lib/libpython3.6m.so.1.0
#16 0x00007fbaf1488b67 in PyRun_SimpleFileExFlags () from /usr/lib/libpython3.6m.so.1.0
#17 0x00007fbaf147ca91 in Py_Main () from /usr/lib/libpython3.6m.so.1.0
#18 0x0000000000400a5d in main ()
(gdb)
and a simple code that replicates it
from io import StringIO
import ijson.backends.yajl2_c as ijson
test = StringIO('[{"events": 1}]')
items = ijson.items(test, 'item.events')
Thanks for working on this, cheers!
@KenjiTakahashi Python 3.6 is fairly new, it wasn't around at the time of submitting these changes, and thus it hadn't been tested until now. I'll definitely have a look at this new issue once I'm back at work in ~1.5 weeks; until then you can use Python 3.5 which should work fine.
Hi @isagalaev,
I'm finally back at work!
After some reading I learned that the "All rights reserved" phrase has no legal effect at all nowadays, and thus I simply removed it from the two files I added in this pull request. The importing of the C module has been changed as agreed, and the additional printing from the tests has been removed as well.
I've also included a fix for the bug reported by @KenjiTakahashi, and extended the python versions used by travis to include 3.5 and 3.6.
@isagalaev any plans for getting this in?
pinging again
Sorry, I'm trying to find time to merge it and release a new version. I thought I'd be able to do it during PyCon past weekend but didn't get around…
Just the usual ping...
It's been some time again, hasn't it :-)?
@isagalaev Yearly ping - is there a chance you can take a look at this again and hopefully merge it?
Hey everyone. I lost track of things happening here a while ago and don't remember why I didn't merge it in the first place. It's also unlikely I'll find time to get back to it any time soon. As far as I understand, @rtobar fork is alive and well, so I recommend everyone just use it.
(I'll close the ticket to avoid confusion, but feel free to continue conversation if needed.)
@isagalaev Assuming @rtobar is interested, would you be willing to give him PyPi privileges to make new releases?
Sure, I wouldn't mind on my side. My user in PyPI is rtobar too.
@rtobar hey! Added you on PyPI as an owner. I can't seem to remove myself, so feel free to either leave me there for precaution or take it over completely.
Thanks Ivan! I already exercised this and uploaded a new 2.4 version of ijson to PyPI, including binary wheels for the main x64 linux/macos.
Yay! Glad not to be a bottleneck anymore! And thanks to @wichert for kicking me in the right direction.
This pull request is to add a new backend based on a C extension. The C extension exposes three generators:
basic_parse
,parse
anditems
, which work the same way their counterparts in the other backends and in thecommon
module work. Reference counting seems correct, creating neither memory corruptions nor memory leaks when parsing big amounts of data.The main advantage of this new backend is its speed. Local tests indicate a gain of ~10x for the
items
generator, and even more for theparse
andbasic_parse
generators, when compared with the cffi backend. This makes this backend comparable to thejson
module in terms of speed, while still giving users an iterative way of consuming a JSON stream, and therefore keeping memory consumption low.I've re-organized the commits from their original form to make the pull request more compact and manageable. Commits include not only the new backend, but also their integration to the package installation procedure (optional, only if the compiler/libraries are found) and to the tests suite. I've also modified the travis configuration to use YAJL2 for testing, effectively testing more backends now (4 instead of 2).