scikit-hep / pyhf

pure-Python HistFactory implementation with tensors and autodiff
https://pyhf.readthedocs.io/
Apache License 2.0
279 stars 83 forks source link

Add support for multiple parameters of interest #179

Open lukasheinrich opened 6 years ago

lukasheinrich commented 6 years ago

Description

The infrastructure should be generic enough to support limit setting on multiple POIs,

swertz commented 4 years ago

What is the status of this? I understand multiple free parameters (normfactor) are possible, but I assume they will all be profiled? It's not entirely clear from the documentation...

lukasheinrich commented 4 years ago

we don't have support for this now, but since we added capabilituy to set arbitrary parameters constant in #653 this should be simple to add, once #690 is in, we can expand this to multiple POIs. If you have a usecase, I'm happy to add this functionality

swertz commented 4 years ago

That's great! I'm interested in doing likelihood unfolding, but there are many other use cases for multi-parameter fits (for instance, fitting EFT parameters). Here CLs limits are not really useful but one would be interested in general confidence intervals, Minos intervals (while either profiling other POIs or keeping them fixed), beside simply getting the best fit and covariance matrix.

Some remarks though:

AFAIK neither of those is supported at the moment?

mswiatlo commented 4 years ago

Hi,

I think we're also interested in this now, @lukasheinrich . For the ATLAS hh4b analysis we'd like to do simultaneous ggF and VBF fits, for example, and we're also interested in SMEFT interpretations where we could fit several parameters simultaneously. It'd be great to get support in PyHF for these kind of models!

Best, Max

matthewfeickert commented 4 years ago

We can look into adding this into an upcoming release once v0.4.2 and v0.5.0 are shipped — which means soon!

alexander-held commented 4 years ago

Hi, for basic MLE fits a "POI" is not different from any other normalization factor, so this should already work at the moment. For things like parameter rankings where the calculation is done wrt. a specific parameter (= the POI) it matters.

kratsg commented 4 years ago

We will likely explore this in our run up to v0.6.0. #846 and #989 are working towards this -- since you want to fix multiple parameters as part of the fit. I will mention two things in brief:

kratsg commented 3 years ago

1051 is in, so we can move forward.

alexander-held commented 3 years ago

If you have a HistFactory XML+ROOT workspace that has multiple POIs declared, I would be interested in seeing it.

@kratsg I have one here: /afs/cern.ch/user/a/alheld/public/pyhf_multi_POI

<POI>Signal_norm Bg_norm</POI>

I am not sure whether this is using HistFactory as intended, but it gets correctly interpreted in at least two (ATLAS-internal) tools I tried it with.

jagrundy commented 3 years ago

Hi,

I was looking forward to this update being part of v0.6.0 but I can see it's now been pushed back to v.0.7.0. Do you have a time estimate for when this development will be complete? Thanks!

Kind regards, James

lukasheinrich commented 3 years ago

Now that we have toys in I think we can prfioritize this. We definitely want to deliver 0.7.0 more quickly than 0.6.0 and I hope within perhaps O(2 months) we can at least have some demo notebooks to to multi-POI fits (the code right now should already enable this a you can fits pdfs with arbitrary fixed parameters, but we're lacking convenience APIs.

@jagrundy have you tried doing this manually by any chance?

jagrundy commented 3 years ago

Hi Lukas,

Thank you for the quick reply and for the update. I haven't tried doing this manually but am aware that it's possible.

James

kratsg commented 3 years ago

Yeah, the timeline seems about right. There are a few hurdles in order to get this happening:

I think while both of these hurdles are rather minor, it's mainly figuring out a consistent API that doesn't throw off a lot of users.

matthewfeickert commented 3 years ago

... it's mainly figuring out a consistent API that doesn't throw off a lot of users.

@jagrundy @swertz @mswiatlo what would be helpful is to try to understand what API users are really expecting. As it seems that you're waiting for this can you help by doing the following:

  1. Create example workspaces with the POIs you can share with us.
  2. Create and share a script with us that uses the existing API and the example workspaces to perform what you hoping that a future API for multiple POI would do.
  3. Give examples, or explain in detail, what you're expecting the API return structure to be.

This will help us to be able to get the thing that you're actually wanting out sooner.

alexander-held commented 3 years ago

Some thoughts from my perspective:

Sally27 commented 2 years ago

Hi pyhf team, do you by any chance have an update on this feature? Is it still on track for v 0.7.0?

kratsg commented 2 years ago

Hi pyhf team, do you by any chance have an update on this feature? Is it still on track for v 0.7.0?

are you asking for support in the spec or in the code? pyhf can handle multiple POIs in the code normally... since multiple fixed parameters are allowed in the fit.

kratsg commented 2 years ago
  • Breaking the schema to use a list of strings for the POI instead of a space-separated string of POI names seems like a good idea. The latter seems prone to parsing issues. How would you handle parameter names like "Signal normalization"? Currently those are possible I think.

ROOT doesn't quite allow this in the HiFa XML spec. Not sure what we do here. Do we match the weird behavior in ROOT or try to do better? Then we get into situations where workspaces in HiFa JSON are not translateable into ROOT XML if we are able to handle spaces via list of strings...

  • pyhf.infer.hypotest would need a kwarg to select the POI (I would expect to pick it by name). I think it is reasonable to default to the first POI if only a single POI is present, and to default to the first POI if multiple are present with a warning/info. infer.fixed_poi_fit could probably behave exactly the same, and the test statistics probably too?

Yes. Like how we handle multiple measurements.

  • A convenience function to quickly get the list of POI names for a model would be nice, to do things like:
    for poi in model.config.poi_list:
      pyhf.infer.hypotest(..., poi=poi)

Yes.

  • In addition, some convenience function to set all but one POI to constant might be nice to have. I imagine there are some use cases where multiple POIs are used for different models and the analyzer is interested in one model at a time. This could either be done by creating new models (which seems a bit convoluted maybe), or directly in the infer functions. I do not have an immediate idea for the API here, might be good to hear from some people who use this kind of setup regularly.

This would be where we'd ask for the first/second/third/nth POI fit. mle.infer.fixed_nth_poi_fit(...) which is a wrapper.

Sally27 commented 2 years ago

Hi pyhf team, do you by any chance have an update on this feature? Is it still on track for v 0.7.0?

are you asking for support in the spec or in the code? pyhf can handle multiple POIs in the code normally... since multiple fixed parameters are allowed in the fit.

Excellent, I am very glad to hear that we can do the multiple POI in the code already, I will try this in my case.

kratsg commented 2 years ago

You can look here (fixed_poi_fit is only a few-lines wrapper around pyhf.infer.mle.fit which handles all situations): https://github.com/scikit-hep/pyhf/blob/1884c6c08bc5b629724dd9ef3c60ec4386353b94/src/pyhf/infer/mle.py#L196-L202

setting fixed_params[index] = True and init_pars[index] = fixed_value is all you need, for each POI index. We just haven't gotten to being able to serialize this information consistently.

alexander-held commented 2 years ago

Do we match the weird behavior in ROOT or try to do better? Then we get into situations where workspaces in HiFa JSON are not translateable into ROOT XML if we are able to handle spaces via list of strings...

A list of strings is arguably the correct solution for this, and I think it's absolutely worth breaking this one particular roundtrip use case in favor of streamlining the experience for everything else. As far as I can tell, the issue only arises in this quite specific setup:

The extent to which things "break" is that parameter names change automatically (or have to be changed manually), and that is only relevant if they contain a space, which presumably is very rare to begin with. Given that this can be caught during conversion and the user can be informed, I think this is quite minor.

alexander-held commented 2 years ago

I tried out a roundtrip from JSON with a POI called Signal norm. Conversion to xml is fine, and xml back to JSON recovers the original spec. A problem only arises with hist2workspace, which looks for two POIs in that setup and cannot find either of them:

WARNING: Can't find parameter of interest: Signal in Workspace. Not setting in ModelConfig.
WARNING: Can't find parameter of interest: norm in Workspace. Not setting in ModelConfig.

Along similar lines, I tried out adding an emoji to a normfactor name: no obvious errors, but in the json2xml -> xml2json roundtrip there's some (presumably encoding-related) slight change: 😀 becomes \ud83d\ude00. (I'm not trying to argue that support for this kind of stuff is needed, but perhaps it may be useful to add a projection to json2xml when encountering Unicode characters, within pyhf itself I see no issues.)

When adding the same emoji to a nuisance parameter name (which has an associated histogram), json2xml works fine, but neither xml2json nor hist2workspace work. That might be a lack of support in ROOT though, since the errors seem to be related to finding the histogram in the .root file.

(edit: the last point was unrelated to pyhf, see https://github.com/scikit-hep/uproot4/issues/576)

amartyarej commented 2 years ago

Hi,

Can you please confirm if in the current version, multiple POI workspace can be converted to pyhf in JSON format as desired by default? Given that currently many analyses are using profile likelihood unfolding and some are published, it will be good to have this feature for likelihood publication.

Regards, Amartya

kratsg commented 2 years ago

pyhf xml2json functionality works normally in the JSON format. It's just not something that's implemented in terms of the functionality. The serialization should be ok.

amartyarej commented 2 years ago

I see. Thanks!

alexander-held commented 2 years ago

The "as desired" part is arguable, I would personally find a format where the POI entry is a list of strings better in the longer term. This is a breaking change however. Right now you will get a single string with spaces between POIs like

"poi": "poi_1 poi_2 poi_3"

which pyhf (as of the current implementation) will fail to convert into a model.

You could consider only specifying a single POI in the model, as this is technically metadata without immediate impact on inference.

If there will be a move to a list format ["poi_1", "poi_2", "poi_3"] at some point in the future, that would either require a workspace update or be handled in the pyhf implementation of reading the workspace information (which is tricker since currently whitespaces are supported in POI names, causing possible ambiguities).

kratsg commented 2 years ago

@amartyarej how quick of a timescale are you asking for? We have a ton of changes in the pipeline.

amartyarej commented 2 years ago

We are planning to finalize our studies by end of August