scikit-hep / histbook

Versatile, high-performance histogram toolkit for Numpy.
BSD 3-Clause "New" or "Revised" License
109 stars 9 forks source link

Feature fill sparksql #12

Closed jpivarski closed 6 years ago

jpivarski commented 6 years ago

Added SparkSQL filling: hist.fill(df) in PySpark uses Spark to compute increases to histogram bins and then adds them into hist.

Tested: all axis types, weights, and some simple expressions.

Implemented but untested: profile plots and Book.fill.

Note: no thread safety yet (not just for Spark, but in general).