visualfabriq / bquery

A query and aggregation framework for Bcolz (W2013-01)
https://www.visualfabriq.com
BSD 3-Clause "New" or "Revised" License
56 stars 11 forks source link

performance improvements to _factorize_str_helper #21

Open ARF1 opened 9 years ago

ARF1 commented 9 years ago

Speedup ca. x1.5 cf. master branch on my machine with compressed bcolz Approximate contributions:

ARF1 commented 9 years ago

Uncompressed bcolz timings on my machine:

bquery master:
In [3]: %timeit -r 10 a.cache_factor(['mycol'], refresh=True)
1 loops, best of 10: 2.46 s per loop

pull request:
In [3]: %timeit -r 10 a.cache_factor(['isin'], refresh=True)
1 loops, best of 10: 1.59 s per loop

==> Factor: 1.5

Compressed bcolz timings on my machine:

bquery master:
In [3]: %timeit -r 10 a.cache_factor(['mycol'], refresh=True)
1 loops, best of 10: 4.03 s per loop

pull request:
In [3]: %timeit -r 10 a.cache_factor(['mycol'], refresh=True)
1 loops, best of 10: 3.13 s per loop

==> Factor: 1.3

Possible additional optimizations (but probably fairly minor):