scikit-hep / cabinetry

design and steer profile likelihood fits
https://cabinetry.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
26 stars 21 forks source link

negative yerr not handled by cabinetry.visualize.data_mc #388

Closed rmnmllr closed 1 year ago

rmnmllr commented 1 year ago

Probably a minor issue and more of a error handling request, but while trying to visualize the prefit plots I receive an error from matplotlib that yerr can't contain negative values:

Traceback (most recent call last):
  File "/terabig/muellerr/Leptoquarks/fitting/run_cabinetry.py", line 333, in <module>
    run_cabinetry(args["inputs"], args["variables"], args["couplings"], gen_input, plot, limit, ranking, args["signi"])
  File "/terabig/muellerr/Leptoquarks/fitting/run_cabinetry.py", line 178, in run_cabinetry
    cabinetry.visualize.data_mc(prediction_prefit, data, config=config, figure_folder=figurespath, log_scale=True, close_figure=True)
  File "/terabig/muellerr/.conda/envs/fit/lib/python3.9/site-packages/cabinetry/visualize/__init__.py", line 236, in data_mc
    fig = plot_model.data_mc(
  File "/terabig/muellerr/.conda/envs/fit/lib/python3.9/site-packages/cabinetry/visualize/plot_model.py", line 174, in data_mc
    ax2.errorbar(
  File "/terabig/muellerr/.conda/envs/fit/lib/python3.9/site-packages/matplotlib/__init__.py", line 1423, in inner
    return func(ax, *map(sanitize_sequence, args), **kwargs)
  File "/terabig/muellerr/.conda/envs/fit/lib/python3.9/site-packages/matplotlib/axes/_axes.py", line 3587, in errorbar
    raise ValueError(
ValueError: 'yerr' must not contain negative values

This is probably related to few statistics in the selected region, but can this be logged and or be catched somewhere?

alexander-held commented 1 year ago

This looks to be happening in the ratio panel, where the yerr kwarg is filled with the Poisson uncertainty on data divided by the total model prediction. The former should be strictly positive, so the pre-fit model prediction for a given bin must be negative here. That's unphysical so we should probably catch this even earlier, for example in model_utils.prediction. Do you expect that the model prediction may be negative? Perhaps this is a consequence of initial parameter values not having suitable values? (There may also be bugs with pre-fit model predictions in combinations with custom modifiers #382, that has not been tested much.)

rmnmllr commented 1 year ago

Thank you @alexander-held!

Do you expect that the model prediction may be negative? Perhaps this is a consequence of initial parameter values not having suitable values?

No the model prediction is not expected to be negative, but I see in the prefit yield table that for one specific region the total is indeed negative :/ We have to go back to the inputs then..

(There may also be bugs with pre-fit model predictions in combinations with custom modifiers https://github.com/scikit-hep/cabinetry/issues/382, that has not been tested much.)

For this error above I used the latest stable version of cabinetry, custom modifiers are not included.

rmnmllr commented 1 year ago

I made it work, it was indeed a consequence of initial paramater values not having suitable values. For my signal I changed the Nominal in NormFactors from:

NormFactors:
  - Name: "Signal_norm"
    Samples: "LQ"
    Nominal: 1
    Bounds: [0, 30]

to

NormFactors:
  - Name: "Signal_norm"
    Samples: "LQ"
    Nominal: 0
    Bounds: [0, 30]

This then also sets my signal LQ to 0.00 in the yields resulting the total to be positive.

alexander-held commented 1 year ago

Is the signal sample predicting a negative number of events? I do not see how otherwise the nominal value of 1 could be causing issues.

rmnmllr commented 1 year ago

Yes they are negative interference terms. The yield table for the mention region for Nominal: 0 looks like:

sample        rec_smim_vs_Nbjet_HighMassOSs2thh_1bjets  
----------    ------------------------------------------
LQ            0.00 ± 0.00                               
Wjets         0.15 ± 0.02                               
Zll           0.00 ± 0.00                               
Ztautau       0.66 ± 0.07                               
multiboson    0.05 ± 0.00                               
singletop     0.44 ± 0.04                               
ttbar         2.31 ± 0.92                               
total         3.61 ± 0.92                               
data          3.61                                        

while for Nominal: 1:

sample        rec_smim_vs_Nbjet_HighMassOSs2thh_1bjets  
----------    ------------------------------------------
LQ            -7.97 ± 0.16                              
Wjets         0.15 ± 0.02                               
Zll           0.00 ± 0.00                               
Ztautau       0.66 ± 0.07                               
multiboson    0.05 ± 0.00                               
singletop     0.44 ± 0.04                               
ttbar         2.31 ± 0.92                               
total         -4.36 ± 0.93                              
data          3.61                                      

It's the only region where the total drops below 0, for other regions there is enough background to stay in the positive. For my understanding, is it wrong to use Nominal: 0 or do I kill the signal with it?

alexander-held commented 1 year ago

Hi @rmnmllr, sorry I forgot to reply here earlier: Nominal: 0 only sets the initial value for the parameter, the best-fit value will then still be determined in the fit so this is fine to use.

rmnmllr commented 1 year ago

hey @alexander-held, no worries -- thanks for taking the time!

After a while we have a new set of inputs and now again I stumble upon the ValueError: 'yerr' must not contain negative values. This time though the ValueError happens at the plotting of the syst. templates cabinetry.visualize.templates. The error message including some additional logger info is:

INFO - cabinetry.visualize.plot_model - label = region: HighMassOSs2thh_1bjets
sample: Zll
systematic: FT_EFF_Eigen_B_0
INFO - cabinetry.visualize.plot_model - nominal_histo['yields']=[9.999999717180685e-10, 9.999999717180685e-10, 0.030537374317646027, 0.04678068682551384, -0.09997741878032684, 0.03829693794250488, 0.004840612877160311, 9.999999717180685e-10, 9.999999717180685e-10]
INFO - cabinetry.visualize.plot_model - template['yields'] =    [9.999999717180685e-10, 9.999999717180685e-10, 0.030371561646461487, 0.04616935923695564, -0.09442667663097382, 0.03830915689468384, 0.004814603365957737, 9.999999717180685e-10, 9.999999717180685e-10]
INFO - cabinetry.visualize.plot_model - template_ratio_plot =   [1.0, 1.0, 0.9945701726199582, 0.986932051877768, 0.944480041422661, 1.0003190582024417, 0.9946268144421762, 1.0, 1.0]
INFO - cabinetry.visualize.plot_model - template['stdev'] =     [0.0, 0.0, 0.026351320503366565, 0.03463904655528507, 0.11993040470068965, 0.038309157288696605, 0.0036683593526272563, 0.0, 0.0]
INFO - cabinetry.visualize.plot_model - yerr =                  [0.0, 0.0, 0.8629203096920959, 0.7404561349106399, -1.1995749256560029, 1.0003190684908039, 0.7578295240125165, 0.0, 0.0]
Traceback (most recent call last):
  File "/terabig/muellerr/Leptoquarks/fitting/run_cabinetry.py", line 356, in <module>
    run_cabinetry(args["inputs"], args["variables"], args["couplings"], gen_input, plot, limit, ranking, args["signi"], args["mlfit"], args["prune_MC"], args["custom_function"])
  File "/terabig/muellerr/Leptoquarks/fitting/run_cabinetry.py", line 97, in run_cabinetry
    cabinetry.visualize.templates(config, figure_folder=figurespath, close_figure=True)
  File "/terabig/muellerr/.conda/envs/fit-dev/lib/python3.9/site-packages/cabinetry/visualize/__init__.py", line 362, in templates
    fig = plot_model.templates(
  File "/terabig/muellerr/.conda/envs/fit-dev/lib/python3.9/site-packages/cabinetry/visualize/plot_model.py", line 365, in templates
    ax2.errorbar(
  File "/terabig/muellerr/.local/lib/python3.9/site-packages/matplotlib/__init__.py", line 1442, in inner
    return func(ax, *map(sanitize_sequence, args), **kwargs)
  File "/terabig/muellerr/.local/lib/python3.9/site-packages/matplotlib/axes/_axes.py", line 3642, in errorbar
    raise ValueError(
ValueError: 'yerr' must not contain negative values

Now for fixing this, I took the absolute value of nominal_histo["yields"] on line 357 e.g.:

yerr=template["stdev"] / abs(nominal_histo["yields"]),

which makes it work.

Do you see any dealbreaker by doing this? At least now I can get some plots, cabinetry later fails at cabinetry.model_utils.prediction with a RuntimeError: [True, False, False, False, False, False, False, False, False] is neither all-True nor all-False, so not compressible but I need to investigate more there before I come (again) back to you 😅

alexander-held commented 1 year ago

RuntimeError: [True, False, False, False, False, False, False, False, False] is neither all-True nor all-False, so not compressible

This probably comes from #390. I have an idea of how to fix that, so hopefully I can get that done soon and release a new version.

Fixing the negative error issue by taking the absolute value at that point seems reasonable to me. Please feel free to submit a PR with this change (perhaps with np.abs instead to make it more obvious that this is an array)!

alexander-held commented 1 year ago

With #390 addressed via #396 and the changes introduced in #394, I believe everything here is addressed in the latest release v0.5.2. Please feel free to open this again (or a new issue) if you run into additional issues!