Open ZcharlieZ opened 6 months ago
Hi again!
I think I got the solution, but I would like to ask for your kind confirmation. By using results.samples_equal()
I assign the weight to the posterior of each parameter. In this way, I can use it to sample the 16, 50, and 84% of my relation at fixed x.
Could you confirm it, please?
Many thanks in advance
Hi,
The default results.samples have weigths associated with them, thus any statistic that needs to be calculated need to use weights. Dynesty contains the code to calcluate the mean and covariance from weighted samples, but if you want to compute things like median, I think it easier to just work with the unweighted samples from samples_equal which provide standard unweighted samples that you can work with like with a regular MCMC samples.
You can also use the quantile
function in utils
to do exactly this, which computes quantiles following a weighted CDF using the ordered samples.
Hi all!
Thanks for your replies. Let me go a bit more into the details of what I'm doing. I'm sampling the PDFs using the Dynamic Nested Sampler. The configuration is the following:
dsampler = dynesty.DynamicNestedSampler(log_likelihood, prior_transform, ndim=len(parameter_labels),
bound='multi', sample='rslice', rstate=rstate, pool=pool, queue_size=250, nlive=1000)
dsampler.run_nested(dlogz_init=0.01, print_progress=True)
The first thing I noticed is that, keeping everything the same and using the same random state (rstate=np.random.default_rng(18)), but changing the sample method from rslice to rwalk the PDFs change significantly. Albeit the number of parameters used is lower than 10, for which unif is suggested, I'm using one of the two methods above because sampling with unif is very slow. Here below you find the traceplots using rslice and rwalk, respectively. In both cases, I also noticed that, after starting the run, the dlogz value in output from print_progress has a starting value, then for a while it becomes a way larger, and finally it reduces to reach the convergence below the threshold (dlogz_init=0.01).
rslice
rwalk
The second sampling should be more in line with what I should expect. In addition, I tried to sample the PDFs using an MCMC algorithm. In the latter, the PDFs differ from the "rwalk" case, but they look a bit more Gaussian-like distributed.
Finally, coming to the use of weights, the PDFs of the parameters differ considerably wider when I get the weighted samples. Indeed, when I plot the final relation, the 1sigma uncertainty around the curve is very large. Here below I attach the corner plots and the relations for the unweighted (blue colours) and weighted (red colours) samples.
To summarise, I list my questions in the following.
Thank you very much for your help, very appreciated!
Carlo
Hi,
Hi @segasai,
thanks for your previous reply. I reply to your previous points here below.
For what concerns the weighted sample, I was expecting the PDFs of parameters to be tighter than those of the non-weighted sample because in the latter all the live points are equally considered, even those far from the max likelihood region. Instead, it is totally the opposite.
Following your suggestion about increasing the number of live points, I performed some tests. In each test, using NestedSampler, I kept fixed the following parameters: bound='multi', rstate=np.random.default_rng(18), queue_size=256, dlogz=0.01. What I varied were both the sampling method ('rwalk' or 'rslice') and the number of live points (4096, 8192, 16384).
As clearly visible, all the PDFs are dramatically different from each other, regardless of the number of live points or the sampling method. Do you think I still need to increase even more the number of live points? Apparently, I am not seeing any improvement. For completeness, I list here below the print_progress for each case:
rwalk 4096 - iter: 186835 | +4096 | bound: 279 | nc: 1 | ncall: 5118636 | eff(%): 3.733 | loglstar: -inf < 1277.795 < inf | logz: 1236.768 +/- 0.092 | dlogz: 0.000 > 0.010;
rwalk 8192 - iter: 289800 | +8192 | bound: 282 | nc: 1 | ncall: 7718960 | eff(%): 3.865 | loglstar: -inf < 1280.933 < inf | logz: 1250.143 +/- 0.059 | dlogz: 0.000 > 0.010;
rwalk 16384 - iter: 616921 | +16384 | bound: 317 | nc: 1 | ncall: 16364828 | eff(%): 3.874 | loglstar: -inf < 1280.754 < inf | logz: 1247.686 +/- 0.043 | dlogz: 0.000 > 0.010;
rslice 4096 - iter: 166164 | +4096 | bound: 210 | nc: 1 | ncall: 11000939 | eff(%): 1.548 | loglstar: -inf < 1280.926 < inf | logz: 1244.949 +/- 0.089 | dlogz: 0.000 > 0.010;
rslice 8192 - iter: 298699 | +8192 | bound: 221 | nc: 1 | ncall: 18752259 | eff(%): 1.637 | loglstar: -inf < 1280.970 < inf | logz: 1249.095 +/- 0.060 | dlogz: 0.000 > 0.010;
rslice 16384 - iter: 610081 | +16384 | bound: 363 | nc: 1 | ncall: 37222505 | eff(%): 1.684 | loglstar: -inf < 1280.965 < inf | logz: 1248.315 +/- 0.043 | dlogz: 0.000 > 0.010.
The version of dynesty used is always the latest one, i.e. v2.1.3.
Many thanks in advance for the help.
Carlo
Hi all!
I do have a doubt about how to plot the results after sampling. Let's suppose I would like to fit a linear relation. After the sampling, I have the posterior of all the parameters of the model. This results in having N curves. Thus at a fixed x value, I can compute for instance the median and the 16% and 84% around the curves. In the case of dynesty, however, weights are used. The posterior should be weighted and, if not wrong, the weight is logwt-logz[-1] for the PDFs in dyplot traceplots or cornerplots. If everything above is correct, how can I plot the median and the uncertainties of the model?
Many thanks in advance