htjb / margarine

Code to replicate posterior probability distributions with bijectors/KDEs and perform marginal KL/bayesian dimensionality calculations.
MIT License
13 stars 8 forks source link

User defined prior range #9

Closed DeaglanBartlett closed 1 year ago

DeaglanBartlett commented 1 year ago

The current version estimates the prior range from the posterior samples, however I have found that this can be too narrow for some purposes. It would be useful if the user could pass the prior range to the MAF function if already known, with the default calculation used if these are not passed.

williamjameshandley commented 1 year ago

From offline conversations it would be helpful to record where the current defaults a =((n-2)xmin - xmax)/(n-3) and b((n-2)xmax - xmin)/(n-3) come from.

This is a relatively non-trivial piece of analysis, and is a simpler version of the solution to Exercise 22.10 in Mackay open-access pdf)

The problem as stated is that you have n samples, drawn from some uniform distribution with unknown limits (a,b), and you observe that the smallest (largest) sample has value xmin (xmax), what do you know about (a,b)?

The punchline is that instead of b=xmax, a=xmin, which is obviously too tight an estimate (but is the maximum likelihood solution) you should instead use:

b = ((n-2)xmax - xmin)/(n-3) = xmax + (xmax-xmin)/(n-3)
a = ((n-2)xmin - xmax)/(n-3) = xmin - (xmax-xmin)/(n-3) 

i.e. you expand each bound by a fraction 1/(n-3) of the range.

Assuming the samples are drawn between some (unknown) limits a and b, then for one sample:

P(x|a,b) = 1/(b-a) [if a<x<b, 0 otherwise]

and our likelihood given n samples is:

P(D|a,b) = 1/(b-a)^n [if all x satisfy a<x<b, 0 otherwise]

Assuming an (improper) uniform prior P(a,b) = const, then

P(a,b|D) = (n-1)(n-2)/(xmax-xmin)^(n-2)/(b-a)^n [if a<xmin and b>xmax, 0 otherwise]

Marginalising out a, one can find

`P(b|D) = (n-2)/(xmax-xmin)^(n-2)/(b-xmin)^(n-1)``

and b:

P(a|D) = (n-2)/(xmax-xmin)^(n-2)/(xmax-a)^(n-1)

These have a maximum posterior/likelihood at a=xmin and b=xmax, but the more Bayesian thing to do if you must compress is to choose the posterior mean, which are the values quoted above. The other (more correct) option is to draw from these distributions whenever you need a value for (a,b).

htjb commented 1 year ago

Hi @DeaglanBartlett, @williamjameshandley,

Thanks for recording the detail on the derived limits here @williamjameshandley. I agree with @DeaglanBartlett that it would be helpful for the user to be able to pass known prior ranges if they are available. Could you go ahead and make a PR @DeaglanBartlett? This should be a fairly straightforward addition of a kwarg to the classes in margarine.maf and margarine.kde. Eventually I would like the two different density estimator classes to inherit from something more generic so that updates like this only require the editing of one class but that is likely a much more extensive PR.

Sorry for the slow reply! Been a bit busy.