rtosholdings / riptable

64bit multithreaded python data analytics tools for numpy arrays and datasets
https://riptable.readthedocs.io/en/stable/
Other
360 stars 21 forks source link

benchmarking #27

Open mattip opened 3 years ago

mattip commented 3 years ago

It would be good to provide a set of benchmarks for riptable. I would suggest adopting the NumPy benchmark suite from numpy/benchmarks which is built to use the Airspeed Velocity (ASV) benchmark system.

jack-pappas commented 3 years ago

@mattip I'll take a look what's needed in terms of adopting the numpy benchmark suite for riptable; perhaps we can also submit some additional benchmarks -- or expand the existing benchmarks -- in the numpy ASV suite.

riptable does have a small, built-in set of benchmarks, the idea being that (at some point in the future) they could be run after installation and used to help tune various settings around block/chunk sizes and threading: https://github.com/rtosholdings/riptable/tree/b38746bddf4e40a3187bb8a251917ef9b444e15b/riptable/benchmarks

I pulled in a small document I wrote earlier on how to run the benchmarks and analyze the results: https://github.com/rtosholdings/riptable/blob/b38746bddf4e40a3187bb8a251917ef9b444e15b/docs/source/benchmarking.rst

One feature of the built-in benchmarking suite in riptable that I've found handy is that the runner itself preserves all of the raw timing data (and would do so for any other statistics we wanted to collect, e.g. memory utilization) and returns it in a Dataset. The raw data can be passed through a summarization function (if you just want some straightforward statistics like min/median/max per (dtype, length) tuple), and/or it can be saved out to disk and e.g. loaded into a notebook when you want to perform a more thorough analysis on the data.