stefankoegl / python-json-pointer

Resolve JSON Pointers in Python
https://python-json-pointer.readthedocs.org/
Other
140 stars 43 forks source link

More efficient resolving #16

Closed alexdutton closed 6 years ago

alexdutton commented 9 years ago

I was profiling some of my code which used jsonpointer, and I noticed that resolving pointers could be made faster.

My profiling script is at https://gist.github.com/alexsdutton/bb95e47a381d0e250fd3/988feaa0e8968893287df01c87eb233c9b86a010. Running on af04ec0, we get the following times on my laptop (Macbook Air, Fedora, Python 2.7.8):

3.43161201477 (/a/b/c/d/e/f)
8.24996113777 (/b/1/1/1/1/1)
7.04101395607 (/c/1/d/1/e/1/f)
1.00753712654 (/-)

After each commit we get, respectively:

2.40125393867
5.88285589218
4.9583530426
1.05458307266

3.08297109604
4.70882201195
4.62245106697
1.13339781761

1.59154200554
3.8058848381
3.40330195427
0.9848549366

So, about a 2x speed-up on the original code.

coveralls commented 9 years ago

Coverage Status

Coverage remained the same at 100.0% when pulling c4372cdb3829138a1e5287e1c4ef8dc9c6c5b5a5 on alexsdutton:more-efficient-resolving into af04ec01a9aa10a723677875ddf1fe6647d6a541 on stefankoegl:master.

coveralls commented 9 years ago

Coverage Status

Coverage remained the same at 100.0% when pulling c4372cdb3829138a1e5287e1c4ef8dc9c6c5b5a5 on alexsdutton:more-efficient-resolving into af04ec01a9aa10a723677875ddf1fe6647d6a541 on stefankoegl:master.

kxepal commented 9 years ago

You can also move try-except in JsonPointer.resolve out of the for loop to prevent Python start/stop tracking context on every loop. This will give you few points for the last test case.

kxepal commented 9 years ago

Also, it may be worth to optionally cythonize jsonpointer for greater speedup.

alexdutton commented 9 years ago

The latest commit gives me the following numbers:

1.25914096832 2.52786207199 2.27428197861 0.568547010422

… but adds a bit more duplication. It also makes get_part and walk redundant, but I've left those in because other people might be relying on them.

coveralls commented 9 years ago

Coverage Status

Coverage decreased (-17.78%) to 82.22% when pulling 9653d1be314e14ae00d86b13807d2d22c0d55e98 on alexsdutton:more-efficient-resolving into af04ec01a9aa10a723677875ddf1fe6647d6a541 on stefankoegl:master.

alexdutton commented 9 years ago

Ah, the coverage decrease will because the tests no longer cover get_part and walk (and I've added more lines of code)

kxepal commented 9 years ago

@alexsdutton my idea was just about to replace this:

for part in self.parts:
  try:
    doc = self.walk(doc, part)
  except JsonPointerException:
    if default is _nothing:
      raise
    else:
      return default
return doc

with:

try:
  walk = self.walk
  for part in self.parts:
    doc = walk(doc, part)
except JsonPointerException:
  if default is _nothing:
    raise
  return default
else:
  return doc

Inlining function calls may indeed change the timings because stack frames aren't cheap, but it there should be a balance between performance optimizations and code quality. Again, Cythonization will give you much better numbers without need to play with micro optimizations.

alexdutton commented 9 years ago

@kxepal that gives:

1.53566789627
3.51046395302
3.03776717186
0.941489219666

I can play with Cython to see how that fares, but presumably that would lead to two codebases (i.e. you'd still need the pure Python implementation for non-CPython environments)?

kxepal commented 9 years ago

@alexsdutton no, it doesn't leads to have two codebases. You can make Cython module as optional and fallback to pure Python one in case if it's hard (hello, windows) or not makes a sense to build (hello, pypy).

kxepal commented 9 years ago

About the numbers, yes, there is no much gain, but this have to be done if you're going to optimize code for performance (:

alexdutton commented 9 years ago

The reason I'm trying to push it as far as it can go is that I've got code that's doing 5.8m resolve calls, and it's taking 100s just for those at its most optimized, down from 175s (I think) when I started.

re codebases, you'd have the Cython implementation, and the Python one, which is two. Have I missed something?

(I presume the usual approach is to do a "try: import _jsonpointer" to get the Cython implementation, and if that fails, use the pure Python implementation)

kxepal commented 9 years ago

@alexsdutton sure, things could be done iterative. Your improvements are significant enough boosts the performance.

As about Cython, take a look on https://github.com/KeepSafe/aiohttp - it has multidict module implemented both as in Python and Cython in the same time.

alexdutton commented 9 years ago

Well that was easy. Pulled out just the resolve method, added a cdef unicode part (which is presumably bad for Py3 compat) and numbers become:

1.12338781357
2.26954817772
2.0227599144
0.52579498291

Though I have broken the tests. (Yes, all a bit hacky at this hour)

kxepal commented 9 years ago

@alexsdutton w00t! looks promising, isn't it? (;

alexdutton commented 9 years ago

Worth trying to get the attention of @stefankoegl at this point?

alexdutton commented 9 years ago

(and yes, certainly promising :grin:)

stefankoegl commented 9 years ago

Sorry for the late reply! I was away from computers for a few weeks.

Yes, this certainly looks promising! I've added one inline comment. It might be just an optimization, but if so, it should be commented (otherwise it might not be noticed, and be removed in the future).