investigate pyhf / HistFactory integration

lukasheinrich commented 5 years ago

Description

One of the most widely used binned models within ATLAS (almost all SUSY searches) is HistFactory, which is a declarative specification of a binned likelihood. We re-implemented the HF pdf in pure python (see e.g. poster at ACAT https://indico.cern.ch/event/708041/contributions/3272095/ ) or talk here (https://indico.cern.ch/event/702612/contributions/2958658/attachments/1649623/2640003/pyhf.pdf ) using a number of tensor backends (Tensorflow / PyTorch / etc .. ) it seems very much in line with the goals of zfit, so it would be nice to investigate possible integrations.

https://github.com/diana-hep/pyhf

cc @cranmer @kratsg @matthewfeickert

jonas-eschle commented 5 years ago

Thanks for reaching out!

As discussed already in person, this sounds like a good idea and should be easily doable. The workflow of zfit is roughly this:

workflow_zfit

So in order to connect the two libraries, one could (as discussed):

use from pyhf the logpdf and wrap it with a zfit model, same for data (and by that feed it into the zfit minimization). Unfortunately this won't work currently since zfit does not yet really support binned data.
just use the loss from pyhf and use SimpleLoss from zfit.loss (this takes a function as an argument which should return a tensor; the loss). Only required modification would be to have a zfit.Parameter for each free parameter and concatenate them together to use it in pyhf. Since I am not familiar with the second step (but I guess this should be easy to do?) I would propose that you try this in an example to see if we can "connect" the two libraries. (If the change to zfit.Parameter does not work, we could add a reasignement and splitting in between, let me know). In case there is any problem, you can also contact us in the Gitter channel

Once the loss is created, simply use the simple fit example. Would be nice if that works out, then we may can also think about other ways of integration, what do you think?

P.S: use the newest version (from pip) to make sure the SimpleLoss works

jonas-eschle commented 5 years ago

@lukasheinrich could you look at it? If not, if you provide me a standalone script that you think should be good enough for a test, I'll play with it. I can also use one of the pyhf examples, let me know which one you think is the most appropriate

jonas-eschle commented 5 years ago

I've added the possibility to use MinuitMinimizer with a "MultiParameter" (I mean if you have a zfit.Parameter which contains all parameters). As we've seen, this worked for AdamMinimizer already, now MinuitMinimizer follows. Notice however that the FitResult and Errors won't work. (and it's not a guaranteed feature (yet), let's see how it works).

This would be interesting to see a performance comparison in the (pyhf) "one-param-for-all" vs (zfit) "one-param-for-one" approach.

jonas-eschle commented 5 years ago

@lukasheinrich any news on this? Did you have the time to try?

kratsg commented 5 years ago

@lukasheinrich any news on this? Did you have the time to try?

Hi @mayou36, we've been a little bit busy trying to clean up some loose ends on our side and we're building up a lot of documentation as we speak. I gave a talk recently at SUSY2019 (~2 weeks ago) if you want to check it out in the meantime: https://indico.cern.ch/event/746178/contributions/3396797/.

Currently, the approach for HistFactory integration does exist in that we now can convert from json -> xml, and that xml can be used to build the workspace with hist2workspace and then HistFactory can be run on it.

zfit / zfit-development

investigate pyhf / HistFactory integration #56

Description