Signal overlay in data/MC plots

Tomoya-Iizawa commented 1 year ago

Hello,

I would like to plot pre-fit distribution with signal overlaid on the background events, as well as scaling of signal events. Is there any option to do this?

I am implementing plotting part as

        cabinetry.visualize.data_mc(prediction_prefit,
                                            data,
                                        config=config,
                                            figure_folder=figurespath,
                                        log_scale=True,
                                            close_figure=True)
                cabinetry.visualize.data_mc(prediction_prefit,
                                            data,
                                            config=config,
                                            figure_folder=figurespath,
                                            log_scale=False,
                                            close_figure=True)

Best Regards, Tomoya

alexander-held commented 1 year ago

Hi @Tomoya-Iizawa, you can scale signal events pre-fit by changing the nominal value of the normalization factor you attach to the signal. In the example config, this corresponds to the following: https://github.com/scikit-hep/cabinetry/blob/master/config_example.yml#L57.

When you talk about overlay, I assume you mean something like the red W' contribution in this example https://atlas.web.cern.ch/Atlas/GROUPS/PHYSICS/CONFNOTES/ATLAS-CONF-2021-043/fig_05a.png? There are two ways to draw this that I am aware of, normalized to total background or normalized to some reference cross-section.

At the moment there is no built-in option to do this. It is conceptually not a difficult addition, but there are a few interface aspects that would be needed:

a way to declare what is the signal that should be used (could use POI if it exists and identify which samples it acts on),
a switch for normalization (to total background),
a way to turn on and off the contribution of signal to the stack of histograms(?),
if drawing post-fit, unclear how to handle the total uncertainty if signal is not included to the stack (uncertainty not defined), if pre-fit need to account for updated total uncertainty.

The easiest way right now to overlay a signal would be to just add the overlaid contribution manually via matplotlib, as the relevant information is in the prediction_prefit object. An example of editing figures is in https://github.com/scikit-hep/cabinetry/issues/265#issuecomment-1638860581. The big advantage of doing it manually is that there is much more flexibility in styling.

It would be interesting to get some feedback for how well this works if doing it manually and externally for this example, as I am generally curious to learn more about which kind of things would best be done outside of cabinetry and which things are most important to natively support.

Tomoya-Iizawa commented 1 year ago

Hi @alexander-held,

Thank you for your quick answer. I am seeing the reference you indicated, but I am not sure where the signal overlaying is performed. For my case, signal events come to bottom of the stack plot. How is the order of the stack is defined? And I would like to modify legend as "signal x 100", could you tell me where I can do it? (Sorry for the basic questions, I was recently handovered cabinetry and still in the way of becoming familiar.)

Best Regards, Tomoya

alexander-held commented 1 year ago

The stack order follows the sample order defined by pyhf, which is alphabetical. There is currently no way to change that.

For changing the legend: what shows up in the legend is the name of the sample you put in your cabinetry config (if you build your workspace with cabinetry) or the workspace itself otherwise. This means that you can rename your signal to "signal x 100" and it will show up like that in the plot. You can accordingly scale the signal by 100 (e.g. by applying a weight of 100 in the config).

The notebook did not include an overlay example, it was just an example for how to generally edit existing figures. It now does include an example of that though: https://gist.github.com/alexander-held/2ca63e4c4c3de2114bf8d903bf28bb4a. In addition, it shows how to re-do the legend, which you can also use to edit the text of the legend to anything you want.

Tomoya-Iizawa commented 1 year ago

Hi @alexander-held,

Thank you for including the example! I am trying to implement it to my code, but it shows error at the legend part as following.

Traceback (most recent call last): File "/afs/cern.ch/work/t/tomoya/diTauAnalysis/fitting/run_cabinetry.py", line 600, in run_cabinetry(args["inputs"], File "/afs/cern.ch/work/t/tomoya/diTauAnalysis/fitting/run_cabinetry.py", line 260, in run_cabinetry legend_handles = ax.get_legend().legend_handles + [stairs_container] AttributeError: 'Legend' object has no attribute 'legend_handles'

Do you have some idea to fix this?

And I have another question that where the overlaid signal is added to plot? I commented out the legend part and then it works, but signal is not overlaid. So are both legend and plot added at that part?

Best Regards, Tomoya

alexander-held commented 1 year ago

You need a more recent version of matplotlib, looks like it is available in matplotlib>=3.7.0.

I am not understanding the other question: ax.stairs adds the signal, ax.legend adds the legend.

Tomoya-Iizawa commented 1 year ago

OK, I will check matplotlib version and try update. About second question, I commented out # update legend: get handles... to ax.legend(handles=legend_handles, labels=legend_labels) in your example, and then the error does not appear and plots are created, but overlaid signal events (i.e. dashed line) do not appear.

alexander-held commented 1 year ago

Are you running the notebook as-is? It works fine for me and I do not know what could be causing problems on your end. Try with updated matplotlib and otherwise simplify the example to maybe isolate issues. I'm afraid I cannot help with general matplotlib support.

Tomoya-Iizawa commented 1 year ago

Hi,

I tried to change matplotlib version from 3.7.1 to 3.7.2, and modify bin_edges, it works. Thank you for your help!

Best Regards, Tomoya

alexander-held commented 1 year ago

Glad to hear it's working! I am closing this as I think this is best done externally like in the example here. If this request comes up again we can consider how we can make this more convenient to users.

Tomoya-Iizawa commented 12 months ago

Hi,

this is just a question, is there any way not to draw signal events in histogram? I mean, in the example you showed, I would like to write just a green dashed line for signal events, and remove histogram filled with green.

Best, Tomoya

alexander-held commented 12 months ago

No, there's no built-in option for that. You can do it in the following way:

build a fit.FitResults object corresponding to pre-fit configuration (assuming you want pre-fit, otherwise can skip that)
- use model_utils.asimov_parameters for parameter values
- use model_utils.prefit_uncertainties for parameter uncertainties
- use a diagonal correlation matrix
set the signal normalization parameter value to zero (your signal normalization is usually at the index given by model.config.poi_index)
feed that fit.FitResults object you built to model_utils.prediction to get a prediction with zero signal
update the legend to remove the signal label in there (skip the respective entries in ax.get_legend().legend_handles in my gist example)

Here is another gist showing you how to build custom model predictions: https://gist.github.com/alexander-held/9eb02d00986ebfbc908a887d8df64ef9.

Tomoya-Iizawa commented 12 months ago

Thank you for your adivce. I am trying to implement the example, but it fails with

Traceback (most recent call last): File "/afs/cern.ch/work/t/tomoya/diTauAnalysis/fitting/run_cabinetry.py", line 665, in run_cabinetry(args["inputs"], File "/afs/cern.ch/work/t/tomoya/diTauAnalysis/fitting/run_cabinetry.py", line 222, in run_cabinetry cabinetry.visualize.data_mc(prediction_prefit, File "/afs/cern.ch/work/t/tomoya/diTauAnalysis/setupMiniforge/miniforge3/lib/python3.9/site-packages/cabinetry/visualize/init.py", line 277, in data_mc fig = plot_model.data_mc( File "/afs/cern.ch/work/t/tomoya/diTauAnalysis/setupMiniforge/miniforge3/lib/python3.9/site-packages/cabinetry/visualize/plot_model.py", line 130, in data_mc bottom=total_yield - total_model_unc, ValueError: operands could not be broadcast together with shapes (3,) (18,)

The difference w.r.t. the example is params = {"g2": 10} instead of params = {"mu": 5, "uncorr_bkguncrt[0]": 1} since g2 is POI in our fitting framework.

Do you have any idea to resolve this? (And is there any good place to ask this kind of question? I am not sure if this "issues" page is the appropriate one.)

alexander-held commented 12 months ago

I think the best place is probably the Q&A section in the discussions: https://github.com/scikit-hep/cabinetry/discussions. I'll convert this over from an issue to a discussion.

With regards to the question: no, I don't have a good idea what could cause this. You can inspect the contents of the model prediction object, it sounds like the lengths of the yield prediction and uncertainties do not match. I do not know how this could happen. There might be a bug in my gist implementation, or a bug inside cabinetry, but I am not aware of one currently. If you can simplify your setup to a minimal version you can share that reproduces the problem I can have a closer look.

scikit-hep / cabinetry

Signal overlay in data/MC plots #422