Closed Dav1dde closed 9 years ago
Wow, this definitely looks cool! Thanks for investigating it. To be frank all the C/Python interop is an unexplored territory for me, so if you could fork it and finish CFFI backends that'd be awesome!
I am currently working on a pull-request, I want to restructure the yajl backends, so it will automatically load in this order: yajl2-cffi, yajl2-ctypes, yajl1-cffi, yajl1-ctypes
. Is there any benefit to let the user choose which backend they want?
Reading the patchnotes yajl2 has a 20%-30% speedboost in comparison to yajl1, so is there any reason you want to choose yajl1 over yajl2? Then from my small testing it looks like cffi is always faster than ctypes (not to mention the huge gain on pypy), so cffi should be prefered?
One more thing, I would actually change ijson.__init__
to load the fastest backend available, in this order: yajl2-cffi, yajl2-ctypes, yajl1-cffi, yajl1-ctypes, python
.
What are your thoughts on this?
+1 for @Dav1dde's thoughts here. We're using ijson
to iterate some very large elastic search aggregations, and the importance of using the yajl2
backend was initially missed on me.
:beers:
Sorry for letting this hang for so long…
The automatic selection of the fastest backend was removed in 96defaf to fix #22. Basically, it seems impractical to test all the combinations of backends and environments for weird bugs in the selection algorithm and since it runs unconditionally we risk making the library unusable in this case. I think the reasoning still stands, even though I understand that one might not have a notion to still read through README when the Python backend just works out of the box. I don't know a good way around this yet.
I'm going to try and fix the tests in the CFFI branch and merge it.
So, I was playing around with parsing huge JSON files (19GiB, testfile is ~520MiB) and wanted to try a sample code with PyPy, turns out, the PyPy needed ~1:30-2:00 where as Python 2.7 needed ~13 seconds (the pure python implementation was close at ~8 minutes).
Apparantly ctypes is really bad performance wise, especially on PyPy. So I made a quick CFFI mockup: https://gist.github.com/Dav1dde/c509d472085f9374fc1d
Before:
After (CFFI):
Maybe it would make sense to add an additional CFFI backend which gets chosen over ctypes if CFFI is available.
Testcode: