fill() performance - Githubissues

belm0 commented 5 years ago

For the use case of collecting execution time samples during a program run (and ultimately reporting quantiles), I'd like fill() to be fairly fast.

physt fill() execution time seem to be independent of binning strategy (trivially using constant data value in my tests). I was surprised that bin search is implemented via np.searchsorted() in all cases, even fixed_width binning.

$ python -m timeit -s 'from physt import h1; h = h1(None, "exponential", 100, range=(1e-6, 1))' 'h.fill(.1)'
10000 loops, best of 5: 38 usec per loop

$ python -m timeit -s 'from physt import h1; h = h1(None, "fixed_width", .01, range=(0, .5))' 'h.fill(.1)'
10000 loops, best of 5: 36.9 usec per loop

Comparing to (unmaintained) https://github.com/carsonfarmer/streamhist:

python -m timeit -s 'from streamhist import StreamHist; h = StreamHist()' 'h.update(.1)'
50000 loops, best of 5: 7.52 usec per loop

(aside: streamhist is quite nice about managing binning and being able to report arbitrary quantiles. Perhaps some of it could be adopted.)

janpipek commented 5 years ago

This can be solved by moving find_bin from Histogram1D to BinningBase (and including more efficient variants in daughter classes).

belm0 commented 5 years ago

I don't believe that StreamHist uses fixed-width bins, yet is still able to have 5x faster update in pure Python. The README credits https://github.com/grantjenks/sorted_containers, if I understand correctly.

janpipek commented 5 years ago

I probably won't be able to make a significant refactoring soon but... in any case, I'd recommend you to use the "fill_n" method if you can.

In [26]: data = np.random.randn(100000)                                                                                                                                   

In [27]: HA = physt.h1(None, "fixed_width", 0.1, adaptive=True)                                                                                                           

In [28]: %time for d in data: HA.fill(d)                                                                                                                                  
CPU times: user 3.86 s, sys: 56.4 ms, total: 3.92 s
Wall time: 3.84 s

In [29]: HA = physt.h1(None, "fixed_width", 0.1, adaptive=True)                                                                                                           

In [30]: %time HA.fill_n(data)                                                                                                                                            
CPU times: user 16.2 ms, sys: 4.01 ms, total: 20.2 ms
Wall time: 19.2 ms

Or, more realistically (simulating that the data come from somewhere one by one):

In [48]: HA = physt.h1(None, "fixed_width", 0.1, adaptive=True)                                                                                                           

In [49]: %time l = []; [l.append(i) for i in data]; HA.fill_n(l)                                                                                                          
CPU times: user 36.9 ms, sys: 4.04 ms, total: 40.9 ms
Wall time: 40.3 ms

belm0 commented 5 years ago

My use case is real time, and spikes from fill_n() batches would be unwanted. Also fill_n() is very slow for small arrays (probably because numpy is).

python -m timeit -s 'from physt import h1; h = h1(None, "fixed_width", .01, range=(.0, .5))' 'h.fill(.1)'
10000 loops, best of 5: 35.3 usec per loop

python -m timeit -s 'from physt import h1; h = h1(None, "fixed_width", .01, range=(0, .5)); d=[.1]*10' 'h.fill_n(d)'
500 loops, best of 5: 740 usec per loop

janpipek commented 5 years ago

Ok, I'll try to optimize the single-value fill soon-ish.

belm0 commented 5 years ago

Here is a more fair timing of streamhist. Since my previous test filled with a constant value, the compute-intensive merging of bins was never triggered.

$ python -m timeit -s 'from random import random; from streamhist import StreamHist; h = StreamHist();' 'h.update(random())'
10000 loops, best of 5: 35.4 usec per loop

That result is just with the the default max bin count of 64. It gets worse quickly as max bins is increased. (Note: overhead of random() is negligible, about 50 ns.)

However, physt is not off the hook so easily... I have an implementation working at 12 usec for the same max bin count. More at https://github.com/janpipek/physt/issues/58#issuecomment-486666287.

janpipek / physt

fill() performance #57