Open mroeschke opened 3 years ago
As of
In [1]: pd.__version__
Out[1]: '1.5.0.dev0+110.g439906e07d'
[ 75.00%] ··· rolling.NumbaVSCython.time_gb_method 12/30 failed
[ 75.00%] ··· ======== ================================ ============= =============
-- cols
----------------------------------------- ---------------------------
method engine_kwargs 1 100
======== ================================ ============= =============
sum ('cython', None) 351±5μs 2.33±0.02ms
sum ('numba', {'parallel': True}) 1.34±0.01ms 8.52±0.1ms
sum ('numba', {'parallel': False}) 1.07±0.01ms 12.1±0.05ms
max ('cython', None) failed failed
max ('numba', {'parallel': True}) failed failed
max ('numba', {'parallel': False}) failed failed
min ('cython', None) failed failed
min ('numba', {'parallel': True}) failed failed
min ('numba', {'parallel': False}) failed failed
var ('cython', None) 157±5μs 2.27±0.05ms
var ('numba', {'parallel': True}) 1.38±0.01ms 10.6±0.1ms
var ('numba', {'parallel': False}) 1.09±0.01ms 16.4±0.06ms
mean ('cython', None) 148±2μs 2.00±0.03ms
mean ('numba', {'parallel': True}) 1.40±0.01ms 10.7±0.2ms
mean ('numba', {'parallel': False}) 1.10±0.03ms 16.9±0.07ms
======== ================================ ============= =============
[100.00%] ··· rolling.NumbaVSCython.time_roll_method 12/30 failed
[100.00%] ··· ======== ================================ ========== ============
-- cols
----------------------------------------- -----------------------
method engine_kwargs 1 100
======== ================================ ========== ============
sum ('cython', None) 461±2μs 27.8±0.2ms
sum ('numba', {'parallel': True}) 365±10μs 12.3±0.2ms
sum ('numba', {'parallel': False}) 358±2μs 14.8±0.4ms
max ('cython', None) failed failed
max ('numba', {'parallel': True}) failed failed
max ('numba', {'parallel': False}) failed failed
min ('cython', None) failed failed
min ('numba', {'parallel': True}) failed failed
min ('numba', {'parallel': False}) failed failed
var ('cython', None) 598±9μs 34.7±2ms
var ('numba', {'parallel': True}) 428±10μs 13.1±0.4ms
var ('numba', {'parallel': False}) 447±3μs 19.7±0.3ms
mean ('cython', None) 502±4μs 30.8±2ms
mean ('numba', {'parallel': True}) 484±20μs 14.8±0.5ms
mean ('numba', {'parallel': False}) 496±20μs 25.6±1ms
======== ================================ ========== ============
Runs with different thread levels and more cols
% NUMBA_NUM_THREADS=4 asv run -b rolling.NumbaVSCython
[ 75.00%] ··· rolling.NumbaVSCython.time_gb_method ok
[ 75.00%] ··· ======== ================================ ============= ============= =============
-- cols
----------------------------------------- -----------------------------------------
method engine_kwargs 1 100 1000
======== ================================ ============= ============= =============
sum ('cython', None) 379±50μs 2.35±0.2ms 19.8±0.3ms
sum ('numba', {'parallel': True}) 1.37±0.01ms 8.30±0.03ms 76.6±0.5ms
sum ('numba', {'parallel': False}) 1.09±0.01ms 12.0±0.01ms 119±0.9ms
max ('cython', None) 307±2μs 2.40±0.01ms 20.4±0.08ms
max ('numba', {'parallel': True}) 3.11±0.02ms 70.0±3ms 695±20ms
max ('numba', {'parallel': False}) 2.24±0.01ms 131±0.3ms 1.36±0s
min ('cython', None) 309±2μs 2.44±0.01ms 21.0±0.06ms
min ('numba', {'parallel': True}) 3.11±0.01ms 69.3±3ms 705±2ms
min ('numba', {'parallel': False}) 2.24±0.01ms 132±0.4ms 1.37±0.01s
var ('cython', None) 155±0.5μs 2.24±0.08ms 21.5±0.3ms
var ('numba', {'parallel': True}) 1.41±0.01ms 10.3±0.06ms 101±1ms
var ('numba', {'parallel': False}) 1.13±0.01ms 16.3±0.1ms 169±1ms
mean ('cython', None) 146±0.7μs 1.95±0.02ms 18.7±0.5ms
mean ('numba', {'parallel': True}) 1.45±0.1ms 10.8±0.6ms 107±3ms
mean ('numba', {'parallel': False}) 1.13±0.01ms 16.6±0.09ms 188±1ms
======== ================================ ============= ============= =============
[100.00%] ··· rolling.NumbaVSCython.time_roll_method ok
[100.00%] ··· ======== ================================ ============= ============ ==========
-- cols
----------------------------------------- -------------------------------------
method engine_kwargs 1 100 1000
======== ================================ ============= ============ ==========
sum ('cython', None) 452±5μs 26.3±0.2ms 309±20ms
sum ('numba', {'parallel': True}) 372±20μs 12.6±1ms 132±3ms
sum ('numba', {'parallel': False}) 346±0.7μs 14.4±0.3ms 190±2ms
max ('cython', None) 641±3μs 46.0±1ms 504±2ms
max ('numba', {'parallel': True}) 3.18±0.2ms 32.4±0.5ms 328±3ms
max ('numba', {'parallel': False}) 2.87±0.3ms 59.0±0.4ms 614±2ms
min ('cython', None) 653±10μs 43.9±0.1ms 503±3ms
min ('numba', {'parallel': True}) 1.13±0.06ms 30.4±0.3ms 327±2ms
min ('numba', {'parallel': False}) 1.28±0ms 58.1±0.8ms 614±2ms
var ('cython', None) 586±1μs 32.7±0.7ms 359±1ms
var ('numba', {'parallel': True}) 440±10μs 13.0±0.4ms 140±5ms
var ('numba', {'parallel': False}) 450±6μs 20.0±0.3ms 240±1ms
mean ('cython', None) 468±3μs 26.9±0.2ms 319±3ms
mean ('numba', {'parallel': True}) 457±10μs 14.2±0.5ms 152±2ms
mean ('numba', {'parallel': False}) 472±9μs 24.3±0.3ms 286±2ms
======== ================================ ============= ============ ==========
% NUMBA_NUM_THREADS=2 asv run -b rolling.NumbaVSCython
[ 75.00%] ··· rolling.NumbaVSCython.time_gb_method ok
[ 75.00%] ··· ======== ================================ ============= ============= ============
-- cols
----------------------------------------- ----------------------------------------
method engine_kwargs 1 100 1000
======== ================================ ============= ============= ============
sum ('cython', None) 438±100μs 3.90±2ms 38.1±20ms
sum ('numba', {'parallel': True}) 1.87±1ms 12.5±6ms 80.5±10ms
sum ('numba', {'parallel': False}) 1.12±0.08ms 12.3±0.7ms 120±0.6ms
max ('cython', None) 308±1μs 2.31±0.02ms 20.5±0.1ms
max ('numba', {'parallel': True}) 2.30±0.01ms 72.7±0.5ms 758±6ms
max ('numba', {'parallel': False}) 2.24±0.02ms 131±0.4ms 1.36±0s
min ('cython', None) 306±3μs 2.40±0.03ms 20.9±0.1ms
min ('numba', {'parallel': True}) 2.31±0.01ms 71.2±0.7ms 752±7ms
min ('numba', {'parallel': False}) 2.23±0ms 132±0.5ms 1.38±0s
var ('cython', None) 156±1μs 2.16±0.08ms 21.5±0.2ms
var ('numba', {'parallel': True}) 1.17±0.01ms 10.8±0.6ms 104±1ms
var ('numba', {'parallel': False}) 1.13±0.01ms 16.4±0.08ms 169±0.9ms
mean ('cython', None) 146±3μs 2.01±0.03ms 18.3±0.3ms
mean ('numba', {'parallel': True}) 1.17±0.01ms 11.2±0.3ms 113±2ms
mean ('numba', {'parallel': False}) 1.13±0.01ms 17.0±0.3ms 187±1ms
======== ================================ ============= ============= ============
[100.00%] ··· rolling.NumbaVSCython.time_roll_method ok
[100.00%] ··· ======== ================================ ============= ============ ===========
-- cols
----------------------------------------- --------------------------------------
method engine_kwargs 1 100 1000
======== ================================ ============= ============ ===========
sum ('cython', None) 455±6μs 26.4±0.4ms 306±2ms
sum ('numba', {'parallel': True}) 286±1μs 10.4±0.4ms 131±0.3ms
sum ('numba', {'parallel': False}) 351±3μs 14.5±0.3ms 190±1ms
max ('cython', None) 633±5μs 44.6±0.7ms 505±1ms
max ('numba', {'parallel': True}) 2.04±0.02ms 33.5±0.1ms 358±3ms
max ('numba', {'parallel': False}) 2.45±0.03ms 59.4±0.4ms 609±1ms
min ('cython', None) 647±5μs 45.7±0.2ms 505±3ms
min ('numba', {'parallel': True}) 798±5μs 32.2±0.4ms 348±4ms
min ('numba', {'parallel': False}) 1.27±0.01ms 58.2±0.4ms 608±0.8ms
var ('cython', None) 585±6μs 32.6±0.8ms 370±6ms
var ('numba', {'parallel': True}) 336±0.8μs 13.1±0.4ms 157±2ms
var ('numba', {'parallel': False}) 455±3μs 20.1±0.3ms 239±0.7ms
mean ('cython', None) 468±2μs 27.0±0.6ms 316±2ms
mean ('numba', {'parallel': True}) 396±8μs 15.5±0.5ms 182±0.9ms
mean ('numba', {'parallel': False}) 452±4μs 24.6±0.5ms 273±1ms
======== ================================ ============= ============ ===========
No modifications, param'd over threads (no change from above)
[ 75.00%] ··· ======== ================================ ====== ============= =============
-- threads
------------------------------------------------ ---------------------------
method engine_kwargs cols 2 4
======== ================================ ====== ============= =============
sum ('cython', None) 1 363±10μs 376±20μs
sum ('cython', None) 100 2.38±0.1ms 2.45±0.7ms
sum ('cython', None) 1000 19.8±0.8ms 21.1±4ms
sum ('numba', {'parallel': True}) 1 1.15±0.08ms 1.36±0.03ms
sum ('numba', {'parallel': True}) 100 8.54±0.3ms 8.44±0.3ms
sum ('numba', {'parallel': True}) 1000 82.3±8ms 82.1±3ms
sum ('numba', {'parallel': False}) 1 1.08±0.01ms 1.09±0.01ms
sum ('numba', {'parallel': False}) 100 12.0±0.03ms 12.0±0.09ms
sum ('numba', {'parallel': False}) 1000 121±2ms 121±2ms
max ('cython', None) 1 315±1μs 317±0.2μs
max ('cython', None) 100 2.39±0.01ms 2.39±0.01ms
max ('cython', None) 1000 20.5±0.2ms 20.5±0.2ms
max ('numba', {'parallel': True}) 1 2.33±0.01ms 3.10±0.01ms
max ('numba', {'parallel': True}) 100 73.7±1ms 70.1±1ms
max ('numba', {'parallel': True}) 1000 767±3ms 712±0.7ms
max ('numba', {'parallel': False}) 1 2.25±0.02ms 2.26±0.04ms
max ('numba', {'parallel': False}) 100 132±2ms 131±2ms
max ('numba', {'parallel': False}) 1000 1.37±0s 1.37±0s
min ('cython', None) 1 312±1μs 315±0.3μs
min ('cython', None) 100 2.43±0.01ms 2.43±0.01ms
min ('cython', None) 1000 20.9±0.1ms 20.9±0.2ms
min ('numba', {'parallel': True}) 1 2.31±0ms 3.13±0.01ms
min ('numba', {'parallel': True}) 100 71.7±0.3ms 69.5±2ms
min ('numba', {'parallel': True}) 1000 766±2ms 714±4ms
min ('numba', {'parallel': False}) 1 2.30±0.04ms 2.26±0.02ms
min ('numba', {'parallel': False}) 100 133±1ms 132±1ms
min ('numba', {'parallel': False}) 1000 1.40±0.03s 1.38±0s
var ('cython', None) 1 163±1μs 161±3μs
var ('cython', None) 100 2.17±0.03ms 2.31±0.04ms
var ('cython', None) 1000 21.5±0.1ms 21.5±0.07ms
var ('numba', {'parallel': True}) 1 1.17±0.01ms 1.42±0.02ms
var ('numba', {'parallel': True}) 100 10.9±0.02ms 10.5±0.1ms
var ('numba', {'parallel': True}) 1000 104±1ms 101±2ms
var ('numba', {'parallel': False}) 1 1.13±0ms 1.15±0.01ms
var ('numba', {'parallel': False}) 100 16.3±0.1ms 16.4±0.09ms
var ('numba', {'parallel': False}) 1000 170±2ms 170±1ms
mean ('cython', None) 1 154±1μs 154±0.9μs
mean ('cython', None) 100 1.99±0.04ms 1.95±0.05ms
mean ('cython', None) 1000 19.1±0.9ms 18.8±0.3ms
mean ('numba', {'parallel': True}) 1 1.18±0.01ms 1.44±0.01ms
mean ('numba', {'parallel': True}) 100 10.9±0.03ms 10.5±0.1ms
mean ('numba', {'parallel': True}) 1000 118±1ms 104±3ms
mean ('numba', {'parallel': False}) 1 1.15±0.01ms 1.14±0.01ms
mean ('numba', {'parallel': False}) 100 16.6±0.03ms 17.0±0.2ms
mean ('numba', {'parallel': False}) 1000 186±1ms 187±0.9ms
======== ================================ ====== ============= =============
[100.00%] ··· rolling.NumbaVSCython.time_roll_method ok
[100.00%] ··· ======== ================================ ============= ============= ============= ============= ========== ==========
-- cols / threads
----------------------------------------- -----------------------------------------------------------------------------
method engine_kwargs 1 / 2 1 / 4 100 / 2 100 / 4 1000 / 2 1000 / 4
======== ================================ ============= ============= ============= ============= ========== ==========
sum ('cython', None) 459±7μs 461±5μs 26.3±0.2ms 26.0±0.09ms 308±3ms 308±4ms
sum ('numba', {'parallel': True}) 293±2μs 371±20μs 10.4±0.4ms 12.1±0.5ms 131±1ms 131±3ms
sum ('numba', {'parallel': False}) 370±8μs 367±5μs 14.4±0.3ms 14.9±0.3ms 192±1ms 191±3ms
max ('cython', None) 646±10μs 642±2μs 45.5±1ms 44.1±0.2ms 504±2ms 505±3ms
max ('numba', {'parallel': True}) 2.07±0.01ms 3.33±0.2ms 33.9±0.4ms 32.0±0.5ms 362±2ms 329±2ms
max ('numba', {'parallel': False}) 2.49±0.01ms 2.48±0.01ms 58.1±0.3ms 59.1±0.6ms 611±3ms 614±10ms
min ('cython', None) 654±5μs 652±5μs 44.2±0.08ms 44.7±0.8ms 506±5ms 510±5ms
min ('numba', {'parallel': True}) 812±10μs 1.12±0.05ms 33.0±0.5ms 30.8±0.9ms 364±1ms 331±4ms
min ('numba', {'parallel': False}) 1.31±0.02ms 1.29±0.01ms 57.7±0.4ms 57.7±0.4ms 616±2ms 614±4ms
var ('cython', None) 599±2μs 597±8μs 31.8±0.5ms 31.7±0.1ms 368±3ms 366±1ms
var ('numba', {'parallel': True}) 343±0.6μs 442±20μs 12.8±0.4ms 13.2±0.4ms 158±2ms 141±3ms
var ('numba', {'parallel': False}) 459±4μs 461±4μs 20.0±0.3ms 19.8±0.3ms 239±2ms 240±3ms
mean ('cython', None) 485±6μs 480±10μs 27.1±0.3ms 27.0±0.1ms 323±3ms 322±2ms
mean ('numba', {'parallel': True}) 393±3μs 476±8μs 15.3±0.4ms 14.6±0.4ms 181±3ms 156±3ms
mean ('numba', {'parallel': False}) 459±1μs 466±1μs 24.8±0.2ms 24.5±0.6ms 288±5ms 282±10ms
======== ================================ ============= ============= ============= ============= ========== ==========
xref https://github.com/numba/numba/issues/4031 but our functions do not specify parallel=False
in the inner function kernels
Some other parallel diagnostics from this local timeit test setup
import numba
cols = 1000
df = pd.DataFrame(np.random.randn(10_000, cols))
roll = df.rolling(100)
# cache
roll.mean(engine="numba", engine_kwargs={"nopython": True, "nogil": True, "parallel": True})
%timeit roll.mean(engine="numba", engine_kwargs={"nopython": True, "nogil": True, "parallel": True})
Threading backend
Backend | timeit |
---|---|
omp | 209 ms ± 7.12 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) |
tbb | 221 ms ± 5.19 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) |
workqueue | 220 ms ± 6.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) |
Setting threads with omp backend
Threads | timeit |
---|---|
1 | 347 ms ± 26 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) |
2 | 201 ms ± 2.97 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) |
3 | 206 ms ± 4.67 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) |
Numba function setup
Setup | timeit |
---|---|
2D w/ np.nanmean | 933 ms ± 16 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) |
2D w/ custom nanmean | 634 ms ± 34 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) |
As of
From this ASV
mean
andsum
have the sliding algorithms implemented.min
,max
,median
usenp.nanmethod