iris-hep / analysis-grand-challenge

Repository dedicated to AGC preparations & execution
https://agc.readthedocs.io
MIT License
24 stars 39 forks source link

Histogram validation and bin migrations #168

Closed alexander-held closed 1 year ago

alexander-held commented 1 year ago

In the CMS ttbar setup there is a histogram validation script (and a reference) provided. There are cases where the observable calculated for an event seems to be extremely close to a bin boundary, and in such cases the event may migrate between bins. This is presumably due to floating point math, and can happen even with the same implementation when using different machines. It can look like this:

4j2b_single_top_t_chan_pt_scale_up
        Contents do not match:
        got      [18.34436054206814, 191.94135120855094, 616.3569931138268, 1158.3923520829348, 1829.0405553414591, 2620.8829280746218, 3549.76845738687, 3981.8725441002866, 3059.056839636495, 2392.6575137718214, 2143.929579980845, 1915.501838183115, 1729.900082816315, 1598.0499986077054, 1437.671737611248, 1326.0544028975658, 1230.6907102781418, 1101.8080959954548, 1029.7795077935136, 962.6742219686713, 882.1478789788599, 816.2565581237128, 752.7931672079566, 705.7857458814076, 652.1689597198491, 593.0218886963431, 9634.026000505984]
        expected [18.34436054206814, 191.94135120855094, 616.3569931138268, 1158.3923520829348, 1829.0405553414591, 2620.8829280746218, 3549.76845738687, 3981.8725441002866, 3059.056839636495, 2392.6575137718214, 2143.8621374825284, 1915.5692806814316, 1729.900082816315, 1598.0499986077054, 1437.671737611248, 1326.0544028975658, 1230.6907102781418, 1101.8080959954548, 1029.7795077935136, 962.6742219686713, 882.1478789788599, 816.2565581237128, 752.7931672079566, 705.7857458814076, 652.1689597198491, 593.0218886963431, 9634.026000505984]

where an event migrates between two bins:

observed: 2143.929579980845, 1915.501838183115
reference: 2143.8621374825284, 1915.5692806814316

In such cases the tolerance can be increased to compensate, but we might want to have a better method to spot this. Importantly, partial sums of the counts should still match in case of migrations.

cc @eguiraud @ekauffma

Some details also in https://github.com/iris-hep/analysis-grand-challenge/pull/163#issuecomment-1608403618.

ekauffma commented 1 year ago

Should this be closed by #171 or is there further work to be done here?

alexander-held commented 1 year ago

Indeed this is addressed by #171, thanks for following up here.