104 add an accuracy based metric

Adding a validation metric to the PosteriorCoverage object which calculates the total log probability of our test data $(x_i, \theta_i)$ under our model posterior $q(\theta|x)$. This is a metric of constraining power.

It works by taking the posterior samples already calculated in PosteriorCoverage, using a Gaussian KDE as a variational distribution to derive a normalized $p(\theta)$, and calculating $p(\theta_o)$ at every observed test parameter. This fails if:

We don't have enough posterior samples (at least N_params ^ 2). Then the Gaussian KDE will not fit well.
We don't have enough test data. Then, the average of the log-likelihoods in the test set may be biased.
Our test data often falls on the edge of sharp priors. The Gaussian KDE assumes an infinite and continuous support.

This is just a hacky solution to have an accuracy-like metric for now. It is used in other works (e.g. https://arxiv.org/abs/1805.07226) and is better than e.g. calculating scatter of true vs. predicted parameters, but not as good or robust as SOTA metrics. However, as Table 1 of https://arxiv.org/abs/2101.04653 shows, further SOTA metrics often require access to either the true posterior or fast simulators.

maho3 / ltu-ili

104 add an accuracy based metric #106