scikit-hep / pyhf

pure-Python HistFactory implementation with tensors and autodiff
https://pyhf.readthedocs.io/
Apache License 2.0
279 stars 83 forks source link

Automatic path detection for workspace conversion from xmls #2224

Open alexander-held opened 1 year ago

alexander-held commented 1 year ago

Summary

I frequently am dealing with RooFit workspaces stored in .root format[^1], which still contain the HistFactory measurement object. That object can be used to ->PrintXML() to then arrive at an xml+ROOT representation of the workspace that can be converted using pyhf xml2json.

After running the conversion, the xmls usually (I do not think I have seen a single exception) contain absolute paths to histograms in the original .root file, but those paths correspond to whatever the user creating the workspace initially has been using. In practice that means I have to search&replace all those absolute paths with an empty string to only have a relative path to the .root file remaining. That then allows me to successfully run pyhf xml2json.

This is a very common pattern for me, and while it is relatively fast to do this search&replace across all the xmls, I think it would be very convenient to have a "magic" option to configure the conversion utility to fall back to some other detection modes if the specified path is wrong. In the simplest case, stripping everything from the path but the file name (to find the file living in the same directory) would already be very useful in my opinion.

How do you feel about an optional new feature like that?

[^1]: The reason I more often have those to start with than xml + ROOT files is that sharing a single file is often easier than sharing many, and the overhead of calling the conversion is small.

Additional Information

N/A

Code of Conduct

kratsg commented 1 year ago

How do you feel about an optional new feature like that?

Can't you use the -v/ --mount option of pyhf xml2json here? That's what it was for. It allows you to strip out the prefixes. See

alexander-held commented 1 year ago

That works indeed, I was not aware I could use it like that. Essentially means getting an error

FileNotFoundError: file not found

    '/some/path/to/the/file/from/when/workspace/was/created/workspace.root'

and then using

pyhf xml2json -v .:/some/path/to/the/file/from/when/workspace/was/created/ workspace.xml > ws.json

It is not quite as convenient as doing that automatically but it's not terribly complicated either.