stan-dev / posteriordb

Database with posteriors of interest for Bayesian inference
161 stars 26 forks source link

Nested sampling posteriors from ultranest #221

Closed JohannesBuchner closed 3 years ago

JohannesBuchner commented 3 years ago

Dear all,

This seems like an interesting project. However, I am not entirely sure of the scope, so I hope it is okay to ask a few questions here.

I was wondering if there is interest to add additional algorithms and their output to posteriordb. I am thinking in particular of https://johannesbuchner.github.io/UltraNest/, a package based on nested sampling. It may not outperform other methods speed-wise, especially in >20dims, but it is quite reliable when dealing with multi-modal posteriors.

Below is code which runs the eightschools model using ultranest, which also produces posterior samples:

import numpy as np
import scipy.stats
from ultranest import ReactiveNestedSampler

# number of schools
J = 8
# estimated treatment
y = np.array([[28, 8, -3, 7, -1, 1, 18, 12]])
# std of estimated effect
sigma = np.array([[15, 10, 16, 11, 9, 11, 10, 18]])

paramnames = ['tau', 'mu'] + ['theta%d' % (i+1) for i in range(J)]

tau_prior = scipy.stats.cauchy(0, 5)
mu_prior = scipy.stats.norm(0, 5)
stdnorm = scipy.stats.norm()

# transform unit cube (probability units) to parameters
def transform(cube):
    params = cube.copy()
    # tau (hyper-parameter of sdv)
    #     uninformative prior tau ~ cauchy(0, 5)
    params[:, 0] = tau_prior.ppf(cube[:, 0])
    # mu (hyper-parameter of mean)
    #     mu ~ normal(0, 5)
    params[:, 1] = mu_prior.ppf(cube[:, 1])
    # thetas: treatment effect in school j
    #     thetas ~ normal(mu, tau)
    params[:, 2:] = stdnorm.ppf(cube[:, 2:]) * params[:,0].reshape((-1, 1)) + params[:,1].reshape((-1, 1))
    return params

def loglike(params):
    # only need to compare to the data (y) here
    thetas = params[:, 2:]
    like = -0.5 * (((thetas - y)/sigma)**2).sum(axis=1)
    return like

sampler = ReactiveNestedSampler(paramnames, loglike, transform=transform, 
    log_dir='8schools', resume='overwrite', vectorized=True)
results = sampler.run(min_ess=400)
sampler.print_results()
sampler.plot()

Posterior samples (resampled to equal weight): equal_weighted_post.txt

While I already am probably stepping on some people's toes, the Bayesian evidence is reported as logZ = -3.928 +- 0.087.

If likelihood functions are available as R code, they can be used via the ultranest R wrapper. Prior transform functions need to be added however.

MansMeg commented 3 years ago

Hi!

I think this sounds interesting. Although one of the main purpose of posteriordb is to enable algorithm comparison. Hence what you are doing is one use case. Although I do not think it is interesting to include another set of posterior draws (since the one we have are more or less a gold standard).

We have an issue open to add the likelihood and priors to the models so one could use that. Although that is not yet implemented.

Did this answer your question?

JohannesBuchner commented 3 years ago

Thanks, I think I understand better now. You are trying to assemble a gold standard posterior for each model, but the comparison of algorithms is out-of-scope and left to the users of this project.

MansMeg commented 3 years ago

Yes.