scikit-hep / histbook

Versatile, high-performance histogram toolkit for Numpy.
BSD 3-Clause "New" or "Revised" License
109 stars 9 forks source link

Plotting histos and saving to pdf/png in script #45

Open professor-calculus opened 6 years ago

professor-calculus commented 6 years ago

Hi,

I've been trying Histbook for some experimental analysis and it seems very good! However I'd like to run over data samples and create plots of e.g. signal vs background(s) for many plottable variables, and save them to png/pdf.

As it is, I can do this via VegaLite using a web browser, by pressing save, but cannot seem to find a way to save directly to a file in the script without opening a browser, so that one can run e.g. in a headless gnu screen session or some sort of cluster/batch mode.

Is there a simple way plots can be exported/saved in Histbook without going through a web browser?

Thanks, Alex

jpivarski commented 6 years ago

This is one of the reasons for VegaScope. That still goes through a web browser, but it can invoke the browser to save as PNG/SVG (use Inkscape or equivalent to convert SVG to PDF). Pop-ups need to be unblocked for VegaScope to have permissions to do that.

It might be a better option to install [vega-lite as an npm package] (https://www.npmjs.com/package/vega-lite), which of course means running nodejs, a non-browser JavaScript environment. This surely has vega-lite to PNG/SVG converters, and there are npm packages that do SVG to PDF conversion.

As you can see, I haven't figured out what would be a good workflow for this, but there are some leads. I don't know if nodejs would be considered an installation burden— it steps out of the Python ecosystem, but it's a well-maintained ecosystem on its own. (A lot of web services use it on the backend to unify server and client code.)

I'm going to leave this issue open, hoping to collect ideas.

Oh, there's another option: histbook Hists can be turned into Pandas DataFrames as easily as vega-lite, and Pandas plots as Matplotlib. I didn't want to bring this dependency into histbook itself (outputting pure JSON is zero-dependency), but you can get to it though Pandas. Just be sure to use the line-drawing and bar chart features of Pandas plotting— "histogram" features aren't relevant because the data passed to Pandas are already aggregated into bins.