Closed DeaglanBartlett closed 1 year ago
From offline conversations it would be helpful to record where the current defaults a =((n-2)xmin - xmax)/(n-3)
and b((n-2)xmax - xmin)/(n-3)
come from.
This is a relatively non-trivial piece of analysis, and is a simpler version of the solution to Exercise 22.10 in Mackay open-access pdf)
The problem as stated is that you have n samples, drawn from some uniform distribution with unknown limits (a,b)
, and you observe that the smallest (largest) sample has value xmin
(xmax
), what do you know about (a,b)
?
The punchline is that instead of b=xmax, a=xmin, which is obviously too tight an estimate (but is the maximum likelihood solution) you should instead use:
b = ((n-2)xmax - xmin)/(n-3) = xmax + (xmax-xmin)/(n-3)
a = ((n-2)xmin - xmax)/(n-3) = xmin - (xmax-xmin)/(n-3)
i.e. you expand each bound by a fraction 1/(n-3)
of the range.
Assuming the samples are drawn between some (unknown) limits a and b, then for one sample:
P(x|a,b) = 1/(b-a) [if a<x<b, 0 otherwise]
and our likelihood given n samples is:
P(D|a,b) = 1/(b-a)^n [if all x satisfy a<x<b, 0 otherwise]
Assuming an (improper) uniform prior P(a,b) = const
, then
P(a,b|D) = (n-1)(n-2)/(xmax-xmin)^(n-2)/(b-a)^n [if a<xmin and b>xmax, 0 otherwise]
Marginalising out a
, one can find
`P(b|D) = (n-2)/(xmax-xmin)^(n-2)/(b-xmin)^(n-1)``
and b
:
P(a|D) = (n-2)/(xmax-xmin)^(n-2)/(xmax-a)^(n-1)
These have a maximum posterior/likelihood at a=xmin
and b=xmax
, but the more Bayesian
thing to do if you must compress is to choose the posterior mean, which are
the values quoted above. The other (more correct) option is to draw from these distributions whenever you need a value for (a,b)
.
Hi @DeaglanBartlett, @williamjameshandley,
Thanks for recording the detail on the derived limits here @williamjameshandley. I agree with @DeaglanBartlett that it would be helpful for the user to be able to pass known prior ranges if they are available. Could you go ahead and make a PR @DeaglanBartlett? This should be a fairly straightforward addition of a kwarg to the classes in margarine.maf
and margarine.kde
. Eventually I would like the two different density estimator classes to inherit from something more generic so that updates like this only require the editing of one class but that is likely a much more extensive PR.
Sorry for the slow reply! Been a bit busy.
The current version estimates the prior range from the posterior samples, however I have found that this can be too narrow for some purposes. It would be useful if the user could pass the prior range to the MAF function if already known, with the default calculation used if these are not passed.