Open ARF1 opened 9 years ago
Uncompressed bcolz timings on my machine:
bquery master:
In [3]: %timeit -r 10 a.cache_factor(['mycol'], refresh=True)
1 loops, best of 10: 2.46 s per loop
pull request:
In [3]: %timeit -r 10 a.cache_factor(['isin'], refresh=True)
1 loops, best of 10: 1.59 s per loop
==> Factor: 1.5
Compressed bcolz timings on my machine:
bquery master:
In [3]: %timeit -r 10 a.cache_factor(['mycol'], refresh=True)
1 loops, best of 10: 4.03 s per loop
pull request:
In [3]: %timeit -r 10 a.cache_factor(['mycol'], refresh=True)
1 loops, best of 10: 3.13 s per loop
==> Factor: 1.3
Possible additional optimizations (but probably fairly minor):
reverse_keys
in _factorize_str_helper
since it is merely an increasing sequence of integers up to reverse_values.size-1
(I wanted to keep the code logic as close to the original as possible.)factorize_str
the reverse
python dictionary (insertion expensive hash-table, I think) is created only to throw it away after creation of carray_values
(Changing this would have obvious knock-on effects on other helper functions.)
Speedup ca. x1.5 cf. master branch on my machine with compressed bcolz Approximate contributions:
with nogil
,with gil
construct