scikit-hep / pyhf

pure-Python HistFactory implementation with tensors and autodiff
https://pyhf.readthedocs.io/
Apache License 2.0
283 stars 84 forks source link

xml2json not working with empty/missing <Data> which should be optional #618

Open beojan opened 5 years ago

beojan commented 5 years ago

Description

xml2json isn't working. With a single channel file, an empty skeleton is returned:

╭─   beojan   ~/DPhil/hh4b/Resolved/Limits                                                                                                                         SIGINT(2) ↵  54.75M   11:59:30 
╰─ pyhf xml2json meas_scalar_251_xml/Measurement_resolved_4b_2017.xml
0channel [00:00, ?channel/s]
{
    "channels": [],
    "measurements": [],
    "observations": [],
    "version": "1.0.0"
}

With the multi channel file, I get an IsADirectoryError:

IsADirectoryError: [Errno 21] Is a directory: '/home/beojan/DPhil/hh4b/Resolved/Limits/'

Expected Behavior

In each case, JSON would be returned describing the full model.

Actual Behavior

An empty JSON skeleton is returned, or an IsADirectoryError is thrown.

Steps to Reproduce

  1. Write a HistFactory measurement to a ROOT file with meas.writeToFile
  2. Load this file in a new ROOT session and write the XML: meas->PrintXML("meas")
  3. Attempt to convert to JSON: pyhf xml2json meas/meas.xml

PyHF is installed into an empty virtualenv with Python 3.7.4. The PyHF version is 0.1.2, but the problem also appears after installing from git master.

Checklist

lukasheinrich commented 5 years ago

thanks @beojan for the report. Can you make the XML and ROOT files available somewhere public (for a toy example there should not be any access issues, no need to use a ATLAS internal example) ? We can then debug. Probably it's some detail that we're not covering for that specific measurement.

beojan commented 5 years ago

Attached. I generated dummy data, but the structure is the same. Beware, there's no top level directory in this file, just a ROOT file and the XML file.

test.tar.gz

beojan commented 5 years ago

The combination XML is also attached here. Measurement.xml.txt

In the combination case, the XML doesn't even load, so it's probably not a problem with the specific measurement.

lukasheinrich commented 5 years ago

Hi, @beojan, the reason is that InputFile is empty for the <Data> section. Is this intentional? (could be related to #566 )

beojan commented 5 years ago

Yes, because the measurement is blinded (I'm just looking for expected limits).

lukasheinrich commented 5 years ago

Thanks. Can you (just as a test) try adding a dummy data histogram (with empty data.. say all zero counts) and see if that works? To be clear you should always run xml2json on the measurement xml, not the individual channels.

On Mon, Oct 21, 2019 at 2:12 PM Beojan Stanislaus notifications@github.com wrote:

Yes, because the measurement is blinded (I'm just looking for expected limits).

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/diana-hep/pyhf/issues/618?email_source=notifications&email_token=AARV6A25I3S2TFRLUSRYBBDQPWMDXA5CNFSM4JC3WBC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEB2DGQY#issuecomment-544486211, or unsubscribe https://github.com/notifications/unsubscribe-auth/AARV6A2KLNUC7R5YAMB6YTTQPWMDXANCNFSM4JC3WBCQ .

beojan commented 5 years ago

I reused the background histogram as the data, and it works. Thanks.

On Mon, Oct 21, 2019 at 1:42 PM Lukas notifications@github.com wrote:

Thanks. Can you (just as a test) try adding a dummy data histogram (with empty data.. say all zero counts) and see if that works? To be clear you should always run xml2json on the measurement xml, not the individual channels.

On Mon, Oct 21, 2019 at 2:12 PM Beojan Stanislaus < notifications@github.com> wrote:

Yes, because the measurement is blinded (I'm just looking for expected limits).

— You are receiving this because you commented. Reply to this email directly, view it on GitHub < https://github.com/diana-hep/pyhf/issues/618?email_source=notifications&email_token=AARV6A25I3S2TFRLUSRYBBDQPWMDXA5CNFSM4JC3WBC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEB2DGQY#issuecomment-544486211 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AARV6A2KLNUC7R5YAMB6YTTQPWMDXANCNFSM4JC3WBCQ

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/diana-hep/pyhf/issues/618?email_source=notifications&email_token=AA4OENKAFFNAYC65DAHE63TQPWPRXA5CNFSM4JC3WBC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEB2FW2Q#issuecomment-544496490, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA4OENOOV55EMB2Q74VDDR3QPWPRXANCNFSM4JC3WBCQ .

kratsg commented 4 years ago

./cc @alexander-held who ran into this issue as well.

alexander-held commented 4 years ago

related: #566