Closed alexdutton closed 6 years ago
You can also move try-except
in JsonPointer.resolve
out of the for
loop to prevent Python start/stop tracking context on every loop. This will give you few points for the last test case.
Also, it may be worth to optionally cythonize jsonpointer for greater speedup.
The latest commit gives me the following numbers:
1.25914096832 2.52786207199 2.27428197861 0.568547010422
… but adds a bit more duplication. It also makes get_part
and walk
redundant, but I've left those in because other people might be relying on them.
Ah, the coverage decrease will because the tests no longer cover get_part
and walk
(and I've added more lines of code)
@alexsdutton my idea was just about to replace this:
for part in self.parts:
try:
doc = self.walk(doc, part)
except JsonPointerException:
if default is _nothing:
raise
else:
return default
return doc
with:
try:
walk = self.walk
for part in self.parts:
doc = walk(doc, part)
except JsonPointerException:
if default is _nothing:
raise
return default
else:
return doc
Inlining function calls may indeed change the timings because stack frames aren't cheap, but it there should be a balance between performance optimizations and code quality. Again, Cythonization will give you much better numbers without need to play with micro optimizations.
@kxepal that gives:
1.53566789627
3.51046395302
3.03776717186
0.941489219666
I can play with Cython to see how that fares, but presumably that would lead to two codebases (i.e. you'd still need the pure Python implementation for non-CPython environments)?
@alexsdutton no, it doesn't leads to have two codebases. You can make Cython module as optional and fallback to pure Python one in case if it's hard (hello, windows) or not makes a sense to build (hello, pypy).
About the numbers, yes, there is no much gain, but this have to be done if you're going to optimize code for performance (:
The reason I'm trying to push it as far as it can go is that I've got code that's doing 5.8m resolve
calls, and it's taking 100s just for those at its most optimized, down from 175s (I think) when I started.
re codebases, you'd have the Cython implementation, and the Python one, which is two. Have I missed something?
(I presume the usual approach is to do a "try: import _jsonpointer" to get the Cython implementation, and if that fails, use the pure Python implementation)
@alexsdutton sure, things could be done iterative. Your improvements are significant enough boosts the performance.
As about Cython, take a look on https://github.com/KeepSafe/aiohttp - it has multidict module implemented both as in Python and Cython in the same time.
Well that was easy. Pulled out just the resolve
method, added a cdef unicode part
(which is presumably bad for Py3 compat) and numbers become:
1.12338781357
2.26954817772
2.0227599144
0.52579498291
Though I have broken the tests. (Yes, all a bit hacky at this hour)
@alexsdutton w00t! looks promising, isn't it? (;
Worth trying to get the attention of @stefankoegl at this point?
(and yes, certainly promising :grin:)
Sorry for the late reply! I was away from computers for a few weeks.
Yes, this certainly looks promising! I've added one inline comment. It might be just an optimization, but if so, it should be commented (otherwise it might not be noticed, and be removed in the future).
I was profiling some of my code which used
jsonpointer
, and I noticed that resolving pointers could be made faster.My profiling script is at https://gist.github.com/alexsdutton/bb95e47a381d0e250fd3/988feaa0e8968893287df01c87eb233c9b86a010. Running on af04ec0, we get the following times on my laptop (Macbook Air, Fedora, Python 2.7.8):
After each commit we get, respectively:
So, about a 2x speed-up on the original code.