Open belm0 opened 5 years ago
This can be solved by moving find_bin
from Histogram1D to BinningBase (and including more efficient variants in daughter classes).
I don't believe that StreamHist uses fixed-width bins, yet is still able to have 5x faster update in pure Python. The README credits https://github.com/grantjenks/sorted_containers, if I understand correctly.
I probably won't be able to make a significant refactoring soon but... in any case, I'd recommend you to use the "fill_n" method if you can.
In [26]: data = np.random.randn(100000)
In [27]: HA = physt.h1(None, "fixed_width", 0.1, adaptive=True)
In [28]: %time for d in data: HA.fill(d)
CPU times: user 3.86 s, sys: 56.4 ms, total: 3.92 s
Wall time: 3.84 s
In [29]: HA = physt.h1(None, "fixed_width", 0.1, adaptive=True)
In [30]: %time HA.fill_n(data)
CPU times: user 16.2 ms, sys: 4.01 ms, total: 20.2 ms
Wall time: 19.2 ms
Or, more realistically (simulating that the data come from somewhere one by one):
In [48]: HA = physt.h1(None, "fixed_width", 0.1, adaptive=True)
In [49]: %time l = []; [l.append(i) for i in data]; HA.fill_n(l)
CPU times: user 36.9 ms, sys: 4.04 ms, total: 40.9 ms
Wall time: 40.3 ms
My use case is real time, and spikes from fill_n()
batches would be unwanted. Also fill_n()
is very slow for small arrays (probably because numpy is).
python -m timeit -s 'from physt import h1; h = h1(None, "fixed_width", .01, range=(.0, .5))' 'h.fill(.1)'
10000 loops, best of 5: 35.3 usec per loop
python -m timeit -s 'from physt import h1; h = h1(None, "fixed_width", .01, range=(0, .5)); d=[.1]*10' 'h.fill_n(d)'
500 loops, best of 5: 740 usec per loop
Ok, I'll try to optimize the single-value fill
soon-ish.
Here is a more fair timing of streamhist. Since my previous test filled with a constant value, the compute-intensive merging of bins was never triggered.
$ python -m timeit -s 'from random import random; from streamhist import StreamHist; h = StreamHist();' 'h.update(random())'
10000 loops, best of 5: 35.4 usec per loop
That result is just with the the default max bin count of 64. It gets worse quickly as max bins is increased. (Note: overhead of random()
is negligible, about 50 ns.)
However, physt is not off the hook so easily... I have an implementation working at 12 usec for the same max bin count. More at https://github.com/janpipek/physt/issues/58#issuecomment-486666287.
For the use case of collecting execution time samples during a program run (and ultimately reporting quantiles), I'd like
fill()
to be fairly fast.physt
fill()
execution time seem to be independent of binning strategy (trivially using constant data value in my tests). I was surprised that bin search is implemented vianp.searchsorted()
in all cases, even fixed_width binning.Comparing to (unmaintained) https://github.com/carsonfarmer/streamhist:
(aside: streamhist is quite nice about managing binning and being able to report arbitrary quantiles. Perhaps some of it could be adopted.)