numba / numba-benchmark

BSD 2-Clause "Simplified" License
9 stars 6 forks source link

initial stab at the numba.typed.List benchmarks #10

Open esc opened 4 years ago

esc commented 4 years ago

This is an intial stab at the ASV tests for the numba.typed.List.

Things that still need to be decided on:

Lastly, here is a snapshot of what it looks like when run on my laptop:

 💣 zsh» asv run --show-stderr  -b 'bench_typed_list*'
· Fetching recent changes.
· Creating environments
· Discovering benchmarks
· Running 4 total benchmarks (1 commits * 1 environments * 4 benchmarks)
[  0.00%] · For numba commit 1949c62e <master>:
[  0.00%] ·· Benchmarking conda-py3.6-cudatoolkit-llvmlite-numpy
[ 12.50%] ··· Running (bench_typed_list.ConstructionSuite.time_construct_from_python_list--)....
[ 62.50%] ··· bench_typed_list.ConstructionSuite.time_construct_from_python_list                                                                               97.8±1ms
[ 75.00%] ··· bench_typed_list.ConstructionSuite.time_construct_in_njit_function                                                                             2.61±0.1ms
[ 87.50%] ··· bench_typed_list.ReductionSuite.time_reduction_sum                                                                                                203±9μs
[100.00%] ··· bench_typed_list.SortSuite.time_sort                                                                                                              117±4ms
asv run --show-stderr -b 'bench_typed_list*'  23.89s user 1.37s system 85% cpu 29.624 total
sklam commented 4 years ago

We do want compile time. It can make a fresh dispatcher and run the .compile method explicitly

esc commented 4 years ago

I have update the tests to seed the RNG and to time compilation. Here is snapshot of a run on my machine:

[ 56.25%] ··· bench_typed_list.ConstructionSuite.time_construct_from_python_list                                                                               96.9±2ms
[ 62.50%] ··· bench_typed_list.ConstructionSuite.time_construct_in_njit_function                                                                            2.17±0.09ms
[ 68.75%] ··· bench_typed_list.ReductionSuite.time_compile_reduction_sum_fastmath                                                                            21.1±0.4μs
[ 75.00%] ··· bench_typed_list.ReductionSuite.time_compile_reduction_sum_no_fastmath                                                                         21.8±0.1μs
[ 81.25%] ··· bench_typed_list.ReductionSuite.time_execute_reduction_sum_fastmath                                                                               195±6μs
[ 87.50%] ··· bench_typed_list.ReductionSuite.time_execute_reduction_sum_no_fastmath                                                                            201±2μs
[ 93.75%] ··· bench_typed_list.SortSuite.time_compile_sort                                                                                                      216±3ms
[100.00%] ··· bench_typed_list.SortSuite.time_execute_sort                                                                                                      116±3ms
esc commented 4 years ago

With recent updates, these are the current benchmarks for the changes introduced by: https://github.com/numba/numba/pull/6278

All benchmarks:

       before           after         ratio
     [3b3eab89]       [05ce51c6]
     <pull/5543/merge~1>       <pull/6278/head~1>
          103±3ms          103±3ms     1.00  bench_typed_list.ConstructionSuite.time_construct_from_python_list
      2.37±0.09ms      2.39±0.07ms     1.01  bench_typed_list.ConstructionSuite.time_construct_in_njit_function
  3.603527119826277e-05±4e-06  2.7983062694041507e-05±9.5e-07    ~0.78  bench_typed_list.ForLoopReductionSuite.time_compile_reduction_sum_fastmath
  4.161973561003964e-05±9.1e-06  2.9423185928547246e-05±3.7e-06    ~0.71  bench_typed_list.ForLoopReductionSuite.time_compile_reduction_sum_no_fastmath
  0.0035349028767086565±0.00034  0.0025243946991395207±2e-05    ~0.71  bench_typed_list.ForLoopReductionSuite.time_execute_reduction_sum_fastmath
  0.0031963562505552545±0.00021  0.0027488698993693105±0.0002    ~0.86  bench_typed_list.ForLoopReductionSuite.time_execute_reduction_sum_no_fastmath
         62.6±2ms         62.7±2ms     1.00  bench_typed_list.ForLoopReductionSuiteFloat.time_compile_reduction_sum_fastmath
         61.9±1ms       61.2±0.7ms     0.99  bench_typed_list.ForLoopReductionSuiteFloat.time_compile_reduction_sum_no_fastmath
      2.59±0.03ms      2.52±0.07ms     0.97  bench_typed_list.ForLoopReductionSuiteFloat.time_execute_reduction_sum_fastmath
      2.58±0.04ms      2.50±0.04ms     0.97  bench_typed_list.ForLoopReductionSuiteFloat.time_execute_reduction_sum_no_fastmath
       29.1±0.8μs       28.0±0.6μs     0.96  bench_typed_list.ForLoopReductionSuiteInt.time_compile_reduction_sum_fastmath
       29.3±0.5μs       31.6±0.4μs     1.08  bench_typed_list.ForLoopReductionSuiteInt.time_compile_reduction_sum_no_fastmath
       2.64±0.1ms      2.55±0.04ms     0.97  bench_typed_list.ForLoopReductionSuiteInt.time_execute_reduction_sum_fastmath
       2.69±0.1ms      2.47±0.07ms     0.92  bench_typed_list.ForLoopReductionSuiteInt.time_execute_reduction_sum_no_fastmath
  2.6585843983129173e-05±2.1e-06  2.2718447455970037e-05±2.8e-07    ~0.85  bench_typed_list.IteratorReductionSuite.time_compile_reduction_sum_fastmath
  2.4821427593867003e-05±2.2e-06  2.3370140742376312e-05±3.2e-07     0.94  bench_typed_list.IteratorReductionSuite.time_compile_reduction_sum_no_fastmath
  0.00023403224115879442±5.1e-06  0.00021722948935967772±3.8e-06     0.93  bench_typed_list.IteratorReductionSuite.time_execute_reduction_sum_fastmath
  0.0002068479817932133±1.5e-05  0.00022312129993224517±7e-06     1.08  bench_typed_list.IteratorReductionSuite.time_execute_reduction_sum_no_fastmath
+      23.3±0.8μs       26.7±0.9μs     1.14  bench_typed_list.IteratorReductionSuiteFloat.time_compile_reduction_sum_fastmath
-      27.0±0.8μs       24.4±0.3μs     0.90  bench_typed_list.IteratorReductionSuiteFloat.time_compile_reduction_sum_no_fastmath
         227±20μs          271±4μs    ~1.19  bench_typed_list.IteratorReductionSuiteFloat.time_execute_reduction_sum_fastmath
         231±20μs         246±10μs     1.07  bench_typed_list.IteratorReductionSuiteFloat.time_execute_reduction_sum_no_fastmath
       22.3±0.4μs       26.1±0.7μs    ~1.17  bench_typed_list.IteratorReductionSuiteInt.time_compile_reduction_sum_fastmath
+      22.8±0.5μs       27.2±0.2μs     1.19  bench_typed_list.IteratorReductionSuiteInt.time_compile_reduction_sum_no_fastmath
          212±7μs          235±4μs    ~1.11  bench_typed_list.IteratorReductionSuiteInt.time_execute_reduction_sum_fastmath
          198±5μs          231±4μs    ~1.17  bench_typed_list.IteratorReductionSuiteInt.time_execute_reduction_sum_no_fastmath
              n/a              n/a      n/a  bench_typed_list.ReductionSuite.time_compile_reduction_sum_fastmath
              n/a              n/a      n/a  bench_typed_list.ReductionSuite.time_compile_reduction_sum_no_fastmath
              n/a              n/a      n/a  bench_typed_list.ReductionSuite.time_execute_reduction_sum_fastmath
              n/a              n/a      n/a  bench_typed_list.ReductionSuite.time_execute_reduction_sum_no_fastmath
          235±6ms          233±5ms     0.99  bench_typed_list.SortSuite.time_compile_sort
          120±2ms          113±2ms     0.94  bench_typed_list.SortSuite.time_execute_sort