johannesulf / nautilus

Neural Network-Boosted Importance Nested Sampling for Bayesian Statistics
https://nautilus-sampler.readthedocs.io
MIT License
73 stars 8 forks source link

variable nautilus execition times #38

Closed simoncasassus closed 1 year ago

simoncasassus commented 1 year ago

hi, thanks very much for nautilus, it sped up my dynesty optimizations by a factor of 3! I am notice some strange behavior, in terms of execution times. A simple fit to a parabola, with 3 parameters, can take anything between 4s and 19s. I attach the example code. Is this behavior expected with Nautilus? Cheers (my apologies if these variable execution times are already discussed somewhere).

import sys import os import os.path

import multiprocessing

from multiprocessing import Pool, Queue, current_process, set_start_method, get_context

import numpy as np from nautilus import Sampler from time import time import corner

import cupy as cp

ctx = get_context('spawn')

cuda_devices = [0,1,2]

def prior_transform(domain, unif_free): """Transforms the uniform random variable u ~ Unif[0., 1.) to the parameter of interest x ~ Unif[-10., 10.).""" x_free = np.zeros_like(unif_free) bnds = domain for iparam in list(range(len(unif_free))): x_free[iparam] = bnds[iparam][0] + ( bnds[iparam][1] - bnds[iparam][0]) * unif_free[iparam]

return x_free

def lnlike(Data, xfree, dum=1, n_gpus=1): yo = 1 yo = Data[0]['y'] xo = 1 xo = Data[0]['x'] a = xfree[0] b = xfree[1] c = xfree[2] ym = dum (a xo + b + c * xo*2) logL = -0.5 np.sum((ym - yo)**2) logL = logL.item() return logL

def main():

gendomain = [
    [-5, 5],
    [-5., 5],  #dimensionless, relative to v_Kep.
    [-5., 5],  #dimensionless, relative to v_Kep.
]

nmesh = 100
xo = np.arange(nmesh) / nmesh - 0.5

zdum = 10.
ydata_0 = zdum * (2. * xo + 3 - 1 * xo**2)

mu = 0.
sigma = 0.1
noise = np.random.normal(mu, sigma, len(xo))
yo = ydata_0 + noise

lnlikekwargs = {'dum': zdum}
n_dim = 3

Data = []
Data.append({'x': xo, 'y': yo})

dsampler = Sampler(prior_transform,
                   lnlike,
                   pass_dict=False,
                   n_dim=n_dim,
                   n_live=10,
                   prior_args=[gendomain],
                   likelihood_args=[Data],
                   likelihood_kwargs=lnlikekwargs)
dsampler.run(verbose=True)

points, log_w, log_l = dsampler.posterior()
np.save('dresult_points', points)
np.save('dresult_log_w', log_w)
np.save('dresult_log_l', log_l)

points = np.load('dresult_points.npy', allow_pickle=True)
log_w = np.load('dresult_log_w.npy', allow_pickle=True)
log_l = np.load('dresult_log_l.npy', allow_pickle=True)

naut_results = list(
    map(lambda v: (v[1], v[2] - v[1], v[1] - v[0]),
        zip(*np.percentile(points, [16, 50, 84], axis=0))))

names = ['a', 'b', 'c']

ibestparams = np.argmax(log_l)
bestparams = points[ibestparams, :]

print("param     distrib     max ")
for iparam in list(range(n_dim)):
    print(names[iparam], naut_results[iparam], bestparams[iparam])

fig = corner.corner(points, weights=np.exp(log_w), labels='abc')
fig.savefig("triangle_test_nautilus.png")

if name == "main": time_start = time() main() time_end = time() print("Execution time for Gen_surface.project (top): %s s" % (time_end - time_start))

johannesulf commented 1 year ago

This doesn't appear to be a bug and is most likely expected. I didn't quite observe the strong variations you reported but only ran the script 10 times. The issue is most likely the very, very low number of live points. Even for simple examples, I'd recommend at least a few hundred live points. With only 10 live points, the quality of the initial exploration can vary dramatically such that sometimes you need a lot more likelihood evaluations to reach the effective sample size of 10,000, the default stopping criterion for sampling. With just 100 live points, the variations in the runtime disappear for this problem and you always need around 14,000 likelihood calls, much less than with 10 live points. Happy to help if you have more questions.

simoncasassus commented 1 year ago

thanks very much this makes sense.