zopefoundation / Products.ZCatalog

Zope's indexing and search solution.
Other
5 stars 22 forks source link

Further optimize excluding results in not queries #125

Closed ale-rt closed 2 years ago

ale-rt commented 2 years ago

Usually the number of the parameters that have to be excluded in the not query is much lower than the number of values in the index, so it makes sense to actually try to pop them out from the list

ale-rt commented 2 years ago
from timeit import timeit

NUMBER = 1000
ITEMS_IN_INDEX = 1000000

index = {x: None for x in range(ITEMS_IN_INDEX)}
not_parm = [666]

def _before():
    [k for k in index.keys() if k not in not_parm]

def _after():
    keys = list(index)
    for parm in not_parm:
        keys.remove(parm)

print("_before:   ", timeit(_before, number=NUMBER))
print("_after:   ", timeit(_after, number=NUMBER))

[ale@flo daskjdklsa]$ python3.9 test.py 
_before:    45.822187587999906
_after:    11.951056316999711
jamadden commented 2 years ago

Given:

def _before():
    [k for k in index.keys() if k not in not_parm]

Because not_parm is a list, this code is O(N * K) where N is the size of the keys, and K is the length of not_parm. It's equivalent to the nested list:

result = []
for k in index.keys(): # O(N)
   for i in not_parm: # O(K)
      if i == k:
         continue
    result.append(k)

Given:

def _after():
    keys = list(index)
    for parm in not_parm:
        keys.remove(parm)

This code is O(N) + O(K * N) + O(N) = O(2N +N * K) because it first iterates a list, then it iterates another list, then it iterates another list, then it resizes said list:

keys = []
for k in index: # O(N)
    keys.append(k) 
for parm in not_parm: # O(K)
    for i, k in enumerate(keys): # O(N)
        if k == parm:
           del keys[i] # O(N)

Now, that last part is actually optimized to "just" a memcpy so for smaller values of N it can probably be ignored, especially on newer machines, leaving us with O(N + N*K).

If I've done that right, it means that algorithmically, the second approach should be worse by O(N). Yet I similarly produce results showing that _after is substantially faster than _before (I moved a loop inside the function to eliminate function call overhead from the timings):

In [22]: def _after():
    ...:     for _ in range(NUMBER):
    ...:         keys = list(index)
    ...:         for parm in not_parm:
    ...:             keys.remove(parm)
    ...:
In [23]: def _before():
    ...:     for _ in range(NUMBER):
    ...:         [k for k in index.keys() if k not in not_parm]
    ...:
In [24]: %timeit _before()
611 ms ± 7.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [25]: %timeit _after()
189 ms ± 3.52 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

So there's probably something else going on. And I think a large part of that is that the first version iterates entirely in Python, while the second version performs the first version completely in optimized C, leaving only a small amount of iteration in Python. Directly comparing those aspects shows a similar difference:

In [26]: def _comprehension():
    ...:     for _ in range(NUMBER):
    ...:         [k for k in index.keys()]
    ...:
In [27]: def _builtin():
    ...:     for _ in range(NUMBER):
    ...:         list(index)
    ...:
In [28]: %timeit _comprehension()
363 ms ± 8.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [29]: %timeit _builtin()
173 ms ± 2.97 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Anyway, this was just for my own edification because I was surprised at the results.

ale-rt commented 2 years ago

Thanks a lot @jamadden for your analysis.

I think we can even squeeze some more time using sets instead that using list, but I would only do that in later PR because that will probably require quite some more work and performance tests.