Optimize btree_state_search

nucypher / zerodb

*This project is no longer actively maintained. If you'd like to become the maintainer, please let us know.* ZeroDB is an end-to-end encrypted database. Data can be stored and queried on untrusted database servers without ever exposing the encryption key. Clients can execute remote queries against the encrypted data without downloading all of it or suffering an excessive performance hit.

GNU Affero General Public License v3.0

1.56k stars 102 forks source link

Optimize btree_state_search #34

Closed micxjo closed 8 years ago

micxjo commented 8 years ago

Based on playing around w/ https://github.com/zero-db/zerodb-benchmarks I saw that btree_state_search was a good candidate for a little micro-optimization. The change below decreases the default benchmark runtime by a little more than 10% for me (CPython2.7). Plus, it's python3 compatible (no cmp()).

Note that the if i > 0 check in get_key is unnecessary because we know that i > lo and lo >= 0.

michwill commented 8 years ago

Oh wow, cool!

Also you've probably noticed the --with-profiling key? It produces a png which shows you a call stack with performance properties. So, you can look at micro-level what's going on.

One thing I thought about this particular function. It was copied from pure-python implementation of B-Trees. B-Trees have its implementation in C as well because it can have a performance hit.

Idea for more optimization: what if instead of writing it in C we use bisect? It's implemented in C, so in principle should be faster than pure-python implementation of the same (and more readable)

micxjo commented 8 years ago

Indeed, switching to a simple

def btree_state_search(state, key):
    if not state:
        return -1, None

    state = state[0]

    i = bisect.bisect_left(state, key, lo=(len(state) + 1) // 2)
    return i // 2, state[i - 1]

gives a massive improvement on the benchmark.

michwill commented 8 years ago

Wow!! We can also add a test for it though, comparing with ~~BTrees._base._BucketBase._search~~ simply list's .index() method