Closed mbaak closed 5 years ago
Sorry—it's hard to reach everybody who uses a piece of software because I don't know who they are or if there are any users. Histogrammar has been deprecated for some time. (Actually, how are we writing on this issue tab? I thought I archived all these repos...)
In fact, it has a successor, histbook, that has also been deprecated.
The situation for HEP-style histogramming in Python is one where we're still trying to get our act together. It's not hard to write a histogramming package—I filled a slide with lots of names in this talk—but it is hard to get the last 20%, to make it mature and usable in the real world. Also, having too many packages to choose from is itself a bad thing.
There were several developments I didn't know about when I started Histogrammar and histbook—Boost.Histogram and its Python bindings. @henryiii has a broad vision of histogramming in Python and I think he'll make a product that will stand the test of time. Rather than competing in this space, I'm contributing histogram interface ideas (such as those from Histogrammar and histbook) and implementations (I'm going to add Numba support to his package someday) to boost-histogram (the C++ wrapper) and hist (the high-level package).
However, your main interest is Spark, which is sufficiently different from the Python histogramming that it wouldn't be a bad thing for it to be an independent implementation. You based your work on Histogrammar, but it could be a project on its own. It would be particularly nice if its interface converged with Henry's package, so that users could readily move between them, despite the fact that you don't want any underlying C++ because that would be hard to run on Spark.
Also, did you remove the dependency between Histogrammar-Python and Histogrammar-Scala? It would be less cumbersome if it didn't have to fetch Scala JARs to function—if it can be made pure Python, that would be an improvement.
I hope this explanation helps!
Hi Jim,
Thanks for the quick response and the links. That's too bad, I had not seen any message that Histogrammar had been deprecated. (Indeed this git repo is still open.)
We typically fill the histograms with spark, and use the resulting histograms in python. (In fact, I have never touched histogrammar-scala, we just use the jar file. But I do appreciate the multi-language support.) Do you happen to be aware of any other histogramming package that supports spark and has python bindings?
Thanks, -Max
What I meant by "using Scala" is "using the JARs." It may be an unnecessary complication.
For Spark, I know that the Coffea project regularly makes histograms in Spark, returning them in Python. This is a later incarnation of the CMS Big Data Project that motivated Histogrammar. You might want to talk to @nsmith- about how they use Spark—they have their physics code in a form that lets you swap out different backends, so that the same code works in Spark, Condor, etc.
I figured out what had happened: I archived my old repos under diana-hep, but missed the Histogrammar ones because they were under a different GitHub organization. When I archive this, we probably won't be able to use the chat anymore, but you're welcome to respond by email: "pivarski" at my princeton.edu address.
Let me take a look at Coffea. Would you mind not archiving this yet? Let me contact you later. Thanks.
Hello Jim, all,
I was wondering if someone could give me a (short) update on the status of histogrammar and histogrammar-python? I see there has not been much activity lately, I want to check if it is still (actively) supported?
At my job we've been using histogrammar to do model performance monitoring. (There are many things we like about it, but in particular the spark support.) I implemented several monkey patches to make the python histograms better workable, and also some fixes in the persistence of multi-dimensional histograms. I'd be more than happy to clean these up and make a pull request for this, but it would be nice to know the status before I do that :-)
Thanks! -Max