zfit / zfit-development

The developement repository for zfit with roadmaps, internal docs etc to clean up the issues
0 stars 2 forks source link

PyMC integration #88

Open jonas-eschle opened 1 year ago

jonas-eschle commented 1 year ago

Dumbing some notes here:

Hey, so I've been reading up a bit recently, increasing my understanding of it, and I think I could grasp what was puzzling me and what is going on. Here is a little hand-waving (please excuse the sloppy language used throughout, it still needs to settle before I can argue more pointed) brain dump, but it maybe helps to grasp the bigger picture of things:

In bayesian inference, we have on the "right" the probability of the parameter given data, we have the prior ("probability of the parameter") and the normalization. On the left, we do have the posterior, the holy grail, the thing we want to know. Having written that down, all is there.

For the following, let's assume a general case: things are not analytically solvable, we have arbitrary distributions, complicated ones.

But we cannot (in general) simply calculate that: If we want the exact distribution on the right, we will need to actually normalize (? I am not 100 certain about the why, but I can accept it for the moment being) correctly and that is forbidden expensive for non-analytically solvable cases. This would mean we would need to integrate over all the possible probabilities with all the possible parameter values weighted by the probability of the parameter values (or similar, AFAIU). Expensive! But we maybe don't need to calculate this: to infer the shape, we can also just sample from the distribution (which does not need a normalization). That's the "usual" MCMC using NUTS (No-UTurn Sampler! typically, or other random walk algorithms. MCMC sampling.

Now, if we're lucky with the distributions, there is another trick to actually get the first case analytically solved: If we know that a distribution of the parameters under the data is of type A and the prior is of type B, then the Posterior will also be of type B (or it's conjugate?). Or similar. Namely, we can get the analytic solution on the left, the posterior (or parts of it) if there is a specific combination of distributions on the right for the prior and the probs for a parameter. And this is exactly what you're using (AFAIU :wink: ): since pyhf has a very narrow, well-defined scope, it offers the conjugate distributions for all the prob dists etc. The prob is usually a Poisson. The prior ("prior" - ish, or what is taken as such) are the constraints - Gauss, Poisson. And for these, we know the conjugate dist, i.e. the posterior. We know the distribution that they will have and we have an expression for the new parameters!

In a more general case, not sure how this can be mixed with the dists. But in general, the conclusions are:

in principle, you should also be able not to use the conjugate dists and use one of the MC samplers: that should yield the same final dist (100% sure that pymc offers that relatively easily, not sure how though). Actually, and I am leaning really far out of the window here, you would not even need to sample per se, as you probably have the actual posterior distribution, analytically. And there are easier ways to determine i.e. the mean of it. But, as said, all a bit hand-waving, sloppy arguments :wink: