issue with weights and posteriors comparing v0.9.5.3 and v2.1.1

mmfausnaugh commented 1 year ago

Dynesty version v2.1.1, pip install dynesty

Describe the bug This is an extension of issue #381 . Basically, I am seeing inconsistent results in the weights and posteriors on an old computer using v0.9.5.3 and a new Apple M1 chip with v2.2.1.

I updated example code at https://github.com/mmfausnaugh/SN_analysis_debug/tree/main To run the test, you would clone the directory and do python analyze_SN_curved_PL_dnest.py

v0.9.5.3 behaves as I expect, and accpetence fractions are about 20%. With v2.1.1., I get acceptance fractions less than 1%, and the posteriors for one of my tests looks very strange.

Setup See example code in repository: https://github.com/mmfausnaugh/SN_analysis_debug/tree/main

Dynesty output The example code with save the output from dynesty as a .npz. The biggest issue is that with 2.1.1., the weights are set to < 1.e-20 for one of my tests.

Bug

The code runs to completion, but the posterior I get for one object causes problems. I can't actually plot it with dynesty functions---you can reproduce my issues running `python plot_dynesty_chains.py test_2022exc_dynesty_results.npz' after running the above example code. Here is the output that is failing in dynesty:

/Users/faus/miniconda3/envs/py310/lib/python3.10/site-packages/dynesty/plotting.py:1368: UserWarning: Attempting to set identical low and high ylims makes transformation singular; automatically expanding.
  ax.set_ylim([0., max(n) * 1.05])
/Users/faus/miniconda3/envs/py310/lib/python3.10/site-packages/dynesty/plotting.py:2310: RuntimeWarning: invalid value encountered in true_divide
  sm /= sm[-1]
WARNING:root:No points at all in the plotted region
Traceback (most recent call last):
  File "/Users/faus/SN_analysis/SN_analysis_debug/plot_dynesty_chains.py", line 46, in <module>
    fig,axes = dyplot.cornerplot(results,
  File "/Users/faus/miniconda3/envs/py310/lib/python3.10/site-packages/dynesty/plotting.py", line 1451, in cornerplot
    _hist2d(y,
  File "/Users/faus/miniconda3/envs/py310/lib/python3.10/site-packages/dynesty/plotting.py", line 2364, in _hist2d
    ax.contourf(X2,
  File "/Users/faus/miniconda3/envs/py310/lib/python3.10/site-packages/matplotlib/__init__.py", line 1442, in inner
    return func(ax, *map(sanitize_sequence, args), **kwargs)
  File "/Users/faus/miniconda3/envs/py310/lib/python3.10/site-packages/matplotlib/axes/_axes.py", line 6467, in contourf
    contours = mcontour.QuadContourSet(self, *args, **kwargs)
  File "/Users/faus/miniconda3/envs/py310/lib/python3.10/site-packages/matplotlib/contour.py", line 769, in __init__
    kwargs = self._process_args(*args, **kwargs)
  File "/Users/faus/miniconda3/envs/py310/lib/python3.10/site-packages/matplotlib/contour.py", line 1411, in _process_args
    x, y, z = self._contour_args(args, kwargs)
  File "/Users/faus/miniconda3/envs/py310/lib/python3.10/site-packages/matplotlib/contour.py", line 1460, in _contour_args
    self._process_contour_level_args(args, z.dtype)
  File "/Users/faus/miniconda3/envs/py310/lib/python3.10/site-packages/matplotlib/contour.py", line 1143, in _process_contour_level_args
    raise ValueError("Contour levels must be increasing")
ValueError: Contour levels must be increasing

The behaviour I expected was: when running on v0.9.5.3, the output plots just fine and the results look as expected..

To reproduce the behavour I see, please see https://github.com/mmfausnaugh/SN_analysis_debug/tree/main .

Additional context Sorry that it has taken so long for me to come back to this!

This is a link back to issue #381 , my issue is persisting in v2.1.1.

I've updated my test example with data that more clearly shows the issue: https://github.com/mmfausnaugh/SN_analysis_debug/tree/main To run the test, you would clone the directory and do python analyze_SN_curved_PL_dnest.py

There are two tests, one for lightcurves of an object called 2020tld, and one with an object called 2022exc. Tests for 2020tld work as expected. 2022exc disagrees strongly between the two versions of dynest. In fact, I cannot plot the results from v2.1.1 because the assigned weights are all < 1.e-20, except for the last entry which is 1.0 .

In both cases, the acceptance fraction is small when using dynesty v2.1.1, less than 1%. on 0.9.5.3, the acceptance fraction is about 20% for both cases, and the posteriors look about the way I expect.

I don't think this is a matter of stochasticity. I am seeding the random number generator, so I would expect to get the same results using both versions of dynesty (I think?). For 2020tld, the posteriors are slightly different, but consistent with each other (the low acceptance fraction however makes 2020tld run much slower on v2.1.1).

For 202exc, the output weights demonstrates the specific problem I was having.

I did not put the .npz files into git, but they can be regenerated in the test. Otherwise, I'll be happy to send the files directly.

Please let me know if you have any questions that I can clarify or there are issues with running my example!

Thanks, --Michael

segasai commented 1 year ago

Ok. There are possibly many issues in play here, but if you put bootstrap=0 when initializing the nested sampler things go much faster and without issues. I think there is something going on here with bootstrap. The bootstraping of bounds will naturally lead to lower acceptance rate, but it is unclear why it slows down things so much and lead to issues...

mmfausnaugh commented 1 year ago

Thanks, setting bootstrap=0 helps a bit. I was able to make the trace plot at least for v2.1.1; attaching that side by side with v0.9.5.3.

It still looks like there is something odd about the posterior samples in the new version, but there are some other hyperparameteres you can advise on?

output from v0.9.5.3, looks like well behaved posteriors: test_2022exc_dynesty_results npztrace_v0 9 5 3

output from v2.1.1, very spikey/narrow posteriors test_2022exc_dynesty_results npztrace_v2 1 1

segasai commented 1 year ago

Thanks. Could you please make a test case that consists solely of

# load the data 
# import module with the likelihood function

dns = DynnestedSampler(module.like,module.prior_transform ,...)
dns.run_nested()
# make a plot or ideally compute some number that indicate there is an issue or not

that would illustrate the problem

So I can easily diagnose the issue/run without digging into the source codes of your files.

Also from running some tests over the weekend one of my impressions is the following. You have here a thin/curved and heavy tailed posterior. In this particular case the ellipsoidal bounding will have particularly tough time. It doesn't really matter for non-uniform samplers, but uniform sampler must ensure that the boundary fully enclose the L>L* region. In this case this leads to large bootstrap enlarge factors. There could be other issues out there, but at least one thing that come to mind I think we should implement is a warning if bootstrap enlarge factors are too large.

The suggestion of setting bootstrap=0 is not a real solution, it was just testing what's playing a role (and the previous behaviour default behaviour of dynesty was exactly using bootstrap=0, but that does lead to biased posteriors).

mmfausnaugh commented 1 year ago

OK, makes sense!

I've updated the example scripts so that it follows the structure you request; i.e., uncovering the call to dynesty.DynamicNestedSampler in the top level script.

It will also automatically make plots; and it will check if the posteriors seem too narrow, by calculating the standard deviation of a few parameters. But, this isn't fool proof because if there are multiple modes the standard deviation is artificially inflated. FWIW, tests for 0.9.5.3 suggest that there shouldn't be multiple modes in most cases. So, looking at the plots is best (I'll try to think of something more clever).

And, of course: I'm very happy to edit or modify the example script if there are issues or something that would make testing this easier.

segasai commented 1 year ago

Thanks, but it doesn't work as it is:

(pyenv310) skoposov@milkyway:~/curwork/SN_analysis_debug$ python analyze_SN_curved_PL_dnest.py 
Traceback (most recent call last):
  File "/home/skoposov/curwork/SN_analysis_debug/analyze_SN_curved_PL_dnest.py", line 14, in <module>
    import SN_model_fits.result_params as result_params
ModuleNotFoundError: No module named 'SN_model_fits.result_params'

also right now your runs are not deterministic, as you are not passing the rstate to the nested sampler.

mmfausnaugh commented 1 year ago

Sorry, I forgot to push the results.params.py file---that should run now?

segasai commented 1 year ago

yes, thanks it works now. I'll try to take a look at it in next few days

segasai commented 1 year ago

I have now identified one oversight that was made in https://github.com/joshspeagle/dynesty/pull/286 Basically when ellipsoids are being split the splitting was not aggressive enough (less agressive than in the past) and that lead to lower efficiencies. I have a tentative fix in this branch https://github.com/joshspeagle/dynesty/tree/splitting_fix3 But I'll need to think about this a bit more.

It is possible that there are some other issues as well, but broadly I think the main issues were

when your posterior is heavily non-gaussian (curved + heavy tailed) then the default bootstrap leads to bounds that are much larger than the L>L* therefore making sampling very ineficient
on top of that somehow this problem in particular was affected by less agressive split of ellipsoids leading again to non-efficient sampling.

Overall one of the takes from this is that

uniform sampling is very sensitive to the boundary. In general sampling like rslice/rwalk are not affected by these issues (they have their own ones)

segasai commented 1 year ago

Okay, after some investigation:

Final last two points

The key problem in your code was 'maxcall' limit. That forced the sampling to interrupt earlier and thus give incomplete posterior
Yes, the default mode is now to bootstrap (because that gives a more correct posteror). That leads to slower sampling thus leading you to hit the maxcall boundary

In the #443

added the warning if bootstrap factor is large (i.e. if the ellipsoidal approximation is bad)
added a warning if the code stops too early due to maxcall/maxiter

Using the #443 your analyze_ script with bootstrap=0 removed and maxcall limit removed runs fine (although not fast and producing a few warnings about inefficient sampling), but gives correct results I'll be closing the issue soon.

mmfausnaugh commented 1 year ago

OK great!

If I'm understanding correctly, you are saying that the low efficiency is caused by the specifics of the likelihood combined the way the ellipsoidal approximation works?

And the fix for more efficient sampling is to use rwalk or rslice to sample?

I'm also still wondering if there is any configuration that would reproduce the same behavior as v0.9.5.3? (i.e., using the default sampling and giving a similar efficieny)

segasai commented 1 year ago

Low efficiency is indeed caused by posterior shape and the ellipsoidal approximation not being ideal here.

If you use bootstrap=0 you get behaviour similar to the one you had in 0.9.5.3 -- keep in mind that will lead to potentially biased posteriors (as bootstrap ensures that ellipsoidal boundary is encompassing L>L* )

Without that the options are

do not force bootstrap=0 the sampling will be slow/(slower)
use rslice/rwalk samplers
Use more live-points may help
Reparametrize the problem so the posterior contours are more ellipsoidal in shape and/or do not have fat tails.

Also you do need to remove the maxcall limit (with the limit in place there is no guarantee that your posterior will be properly sampled); now you'll see a warning in the end if you use maxcall and the fit stops early. (alternatively you can keep the maxcall in place, but then you'll need to check if the effective number of samples you get is sufficient)

mmfausnaugh commented 1 year ago

OK, sounds great! Thanks for your help and looking into this.

joshspeagle / dynesty

issue with weights and posteriors comparing v0.9.5.3 and v2.1.1 #440