joshspeagle / dynesty

Dynamic Nested Sampling package for computing Bayesian posteriors and evidences
https://dynesty.readthedocs.io/
MIT License
346 stars 76 forks source link

No bounds update and fixed number of calls each iteration #431

Closed ajw278 closed 1 year ago

ajw278 commented 1 year ago

Dynesty version Installed the latest version of dynesty via pip for python 3.

Describe the bug When running dynesty (either dynamic or static) the bounds only update once regardless of the update_interval, and the number of calls becomes fixed. This happens with a variety of different setups. I am working with very messy, multi-modal log-likelihood distribution which can be very negative (log L = -1 * several thousand in some cases).

Setup I have run both dynamic and static version of dynesty, and tried different update_intervals etc. However, the bounds do not update, and the number of calls remains fixed for each iteration. The update interval is set to 500, and the bounds call is for 'multi' (I have also had problems with 'cube', but that's a separate issue).

I am running the following with:

nprocs=16 (but have tried other) ndim = 16 (but have tried other) dynamic = True (but have tried static)

` pool=multiprocessing.Pool(processes=nprocs, initializer=setglobals, initargs=(dmc, rstart, rend, distance_, rescont, priors, Ngauss)) pool.size = nprocs

if not os.path.isdir('dmaps'):
    os.makedirs('dmaps')

resume=False
if not load or not os.path.isfile('dmaps/'+filename+'.save'):
    print("Initiating new dynesty run...")
    if not dynamic:
        sampler = dnst.NestedSampler(log_prob, prior_transform, ndim,  nlive=nlive,  bound='multi',\
        update_interval=500,first_update={'min_ncall': 5000, 'min_eff': 25.}, pool=pool)
    else:
        sampler = dnst.DynamicNestedSampler(log_prob, prior_transform, ndim,  bound='multi',\
        update_interval=500,first_update={'min_ncall': 5000, 'min_eff': 25.}, pool=pool)
else:   

    print("Loading previous dynesty exploration...")

    sampler = dnst.NestedSampler.restore('dmaps/'+filename+'.save', pool=pool)
    resume=True

if run:
    print('Running dynesty...')
    if not dynamic:
        sampler.run_nested(maxiter=max_n,dlogz=0.1, checkpoint_file='dmaps/'+filename+'.save', resume=resume, maxcall=1000000)
    else:
        sampler.run_nested(maxiter=max_n,dlogz_init=0.1, nlive_init=nlive, nlive_batch=100,checkpoint_file='dmaps/'+filename+'.save',\
         resume=resume,  maxiter_init=10000, maxiter_batch=1000, maxbatch=10,  use_stop=True,wt_kwargs={'pfrac': 0.0}, maxcall=1000000)

`

Dynesty output It isn't crashing, but here is an example (in this case, nc is 36 for every iteration, but have seen other numbers).

This is what happens when eff < 25 % (the threshold I set): iter: 2157 | batch: 0 | bound: 0 | nc: 7 | ncall: 7641 | eff(%): 24.962 | loglstar: -inf < -988.284 < inf | logz: -995.568 +/- nan | dlogz: 883.928 > 0.100
iter: 2158 | batch: 0 | bound: 1 | nc: 43 | ncall: 7684 | eff(%): 24.850 | loglstar: -inf < -987.709 < inf | logz: -995.358 +/- nan | dlogz: 883.761 > 0.100
iter: 2159 | batch: 0 | bound: 1 | nc: 36 | ncall: 7720 | eff(%): 24.759 | loglstar: -inf < -987.666 < inf | logz: -995.137 +/- nan | dlogz: 883.550 > 0.100 iter: 2160 | batch: 0 | bound: 1 | nc: 36 | ncall: 7756 | eff(%): 24.669 | loglstar: -inf < -987.448 < inf | logz: -994.932 +/- nan | dlogz: 883.327 > 0.100

Then this stays like this forever:

iter: 6554 | batch: 0 | bound: 1 | nc: 36 | ncall: 168020 | eff(%): 3.901 | loglstar: -inf < -287.810 < inf | logz: -298.154 +/- nan | dlogz: 225.024 > 0.100
iter: 6555 | batch: 0 | bound: 1 | nc: 36 | ncall: 168056 | eff(%): 3.900 | loglstar: -inf < -287.781 < inf | logz: -298.110 +/- nan | dlogz: 224.978 > 0.100
iter: 6556 | batch: 0 | bound: 1 | nc: 36 | ncall: 168092 | eff(%): 3.900 | loglstar: -inf < -287.743 < inf | logz: -298.066 +/- nan | dlogz: 224.933 > 0.100

Bug There is no error, but convergence will take forever and I am confused about why this would happen given that the bounds do not update leading to the slow efficiency.

The behaviour I expected was updated bounds and a varying efficiency over time (i.e. nc changes stochastically).

I have not provided the full code/data, but can provide it privately if necessary.

segasai commented 1 year ago

Hi,

Thanks for the report. I'm currently trying to refactor/streamline/fix the bounding update code in #427 #425 #428.

I think there is certainly an issue there, I'm just trying to nail the proper fix. If you can send me privately the code I could run myself, that'd be helpful.

Thanks

ajw278 commented 1 year ago

Hi Sergey,

Thanks very much for getting back to me.

I've attached a tarred version of the code. Apologies that it is a mess, it is very much in development. Unfortunately it's not easy to show you this without everything, since you need the underlying data, It should just run in the main directory:

python dmapper.py -m ns -N 3 -p 16 -d -R 1000

-m is mode (ns for nested sampling) -N is number of Gaussians (not important) -p is number of processes (I set to 16) -d means dynamic -R is the (initial) number of live points

The main file that is relevant is 'dsolver_ns.py', where the call is made to execute_ns, which is where the sampling is run.

There are some dependencies (astropy etc.) that should be easy enough, but let me know if anything doesn't work.

As a side note, I would really like to use 'cubes' for the bounds, but had a similar problem to someone else online that the KDE call through a complex number error. I think it's sensible to focus on one issue at a time, but I thought I'd mention it in case it came up.

Best, Andrew

On Thu, 16 Mar 2023 at 16:19, Sergey Koposov @.***> wrote:

Hi,

Thanks for the report. I'm currently trying to refactor/streamline/fix the bounding update code in #427 https://github.com/joshspeagle/dynesty/issues/427 #425 https://github.com/joshspeagle/dynesty/issues/425 #428 https://github.com/joshspeagle/dynesty/pull/428.

I think there is certainly an issue there, I'm just trying to nail the proper fix. If you can send me privately the code I could run myself, that'd be helpful.

Thanks

— Reply to this email directly, view it on GitHub https://github.com/joshspeagle/dynesty/issues/431#issuecomment-1472182874, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADXITS44I5Y4JZAVPM5DNC3W4MVORANCNFSM6AAAAAAV377Y6Y . You are receiving this because you authored the thread.Message ID: @.***>

segasai commented 1 year ago

I don't think I received the attach. Presumably github strips them. But in the meantime, could you try this branch https://github.com/joshspeagle/dynesty/tree/ellipsoid_fix I'm wondering if that fixes your problem

ajw278 commented 1 year ago

Ah, I see -- I've made this drive link public for now:

https://drive.google.com/file/d/10uoa3gfAR9qWfwij_7S0dTseoUxPikBt/view?usp=sharing

So you should be able to download it there. Although I will try the branch you mentioned, thanks!

On Thu, 16 Mar 2023 at 23:42, Sergey Koposov @.***> wrote:

I don't think I received the attach. Presumably github strips them. But in the meantime, could you try this branch https://github.com/joshspeagle/dynesty/tree/ellipsoid_fix I'm wondering if that fixes your problem

— Reply to this email directly, view it on GitHub https://github.com/joshspeagle/dynesty/issues/431#issuecomment-1472857949, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADXITSYQFHJSAJTLZOVIGLLW4OJLXANCNFSM6AAAAAAV377Y6Y . You are receiving this because you authored the thread.Message ID: @.***>

ajw278 commented 1 year ago

Hi, So it seems like the problem hasn't gone away, but changed. Now when the efficiency goes below 25 percent (min ncall=5000), the bounds are not updated at all:

2092it [26:56, 1.27s/it, batch: 0 | bound: 0 | nc: 14 | ncall: 7342 | eff(%): 25.078 | loglstar: -inf < -1017.622 < inf | logz: -1025.676 +/- nan | 2093it [26:58, 1.30s/it, batch: 0 | bound: 0 | nc: 14 | ncall: 7356 | eff(%): 25.048 | loglstar: -inf < -1017.424 < inf | logz: -1025.318 +/- nan | 2094it [27:01, 1.62s/it, batch: 0 | bound: 0 | nc: 12 | ncall: 7368 | eff(%): 25.024 | loglstar: -inf < -1017.402 < inf | logz: -1025.030 +/- nan | 2095it [27:10, 3.10s/it, batch: 0 | bound: 0 | nc: 34 | ncall: 7402 | eff(%): 24.935 | loglstar: -inf < -1017.381 < inf | logz: -1024.803 +/- nan | 2096it [27:13, 3.07s/it, batch: 0 | bound: 0 | nc: 8 | ncall: 7410 | eff(%): 24.923 | loglstar: -inf < -1017.102 < inf | logz: -1024.589 +/- nan | 2098it [27:19, 2.97s/it, batch: 0 | bound: 0 | nc: 9 | ncall: 7425 | eff(%): 24.902 | loglstar: -inf < -1016.605 < inf | logz: -1024.122 +/- nan | 2100it [27:24, 2.90s/it, batch: 0 | bound: 0 | nc: 31 | ncall: 7458 | eff(%): 24.829 | loglstar: -inf < -1016.524 < inf | logz: -1023.739 +/- nan | 2101it [27:28, 3.00s/it, batch: 0 | bound: 0 | nc: 16 | ncall: 7474 | eff(%): 24.793 | loglstar: -inf < -1016.497 < inf | logz: -1023.583 +/- nan | 104it [27:30, 2.01s/it, batch: 0 | bound: 0 | nc: 3 | ncall: 7489 | eff(%): 24.785 | loglstar: -inf < -1015.960 < inf | logz: -1023.170 +/- nan |

On Thu, 16 Mar 2023 at 23:42, Sergey Koposov @.***> wrote:

I don't think I received the attach. Presumably github strips them. But in the meantime, could you try this branch https://github.com/joshspeagle/dynesty/tree/ellipsoid_fix I'm wondering if that fixes your problem

— Reply to this email directly, view it on GitHub https://github.com/joshspeagle/dynesty/issues/431#issuecomment-1472857949, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADXITSYQFHJSAJTLZOVIGLLW4OJLXANCNFSM6AAAAAAV377Y6Y . You are receiving this because you authored the thread.Message ID: @.***>

segasai commented 1 year ago

Thanks for the code and the test, now https://github.com/joshspeagle/dynesty/tree/ellipsoid_fix branch should fix it. There was definitely a bug when update_interval was incorrectly set for the dynamic sampling. As the branch contains some refactoring as well, it's not yet final.

ajw278 commented 1 year ago

Thanks - it seems with that version that the bounds are now updating, but the updated bounds makes sampling less efficient somehow, and also then settles into the same behaviour as before (i.e. a fixed number of calls each iteration):

2081it [28:02, 1.43s/it, batch: 0 | bound: 0 | nc: 10 | ncall: 7418 | eff(%): 25.018 | loglstar: -inf < -990.726 < inf | logz: -998.397 +/- nan | 2083it [28:04, 1.27s/it, batch: 0 | bound: 0 | nc: 4 | ncall: 7425 | eff(%): 25.021 | loglstar: -inf < -990.299 < inf | logz: -997.895 +/- nan | 2086it [28:08, 1.33s/it, batch: 0 | bound: 0 | nc: 16 | ncall: 7452 | eff(%): 24.976 | loglstar: -inf < -989.168 < inf | logz: -997.154 +/- nan | 2088it [28:09, 1.18s/it, batch: 0 | bound: 1 | nc: 38 | ncall: 7492 | eff(%): 24.881 | loglstar: -inf < -988.763 < inf | logz: -996.544 +/- nan | 2104it [28:11, 2.64it/s, batch: 0 | bound: 2 | nc: 36 | ncall: 8068 | eff(%): 23.461 | loglstar: -inf < -984.964 < inf | logz: -992.766 +/- nan | 2120it [28:13, 4.10it/s, batch: 0 | bound: 3 | nc: 36 | ncall: 8644 | eff(%): 22.213 | loglstar: -inf < -980.443 < inf | logz: -987.999 +/- nan | 2136it [28:14, 5.47it/s, batch: 0 | bound: 4 | nc: 36 | ncall: 9220 | eff(%): 21.107 | loglstar: -inf < -976.901 < inf | logz: -984.507 +/- nan | 2152it [28:16, 6.72it/s, batch: 0 | bound: 5 | nc: 36 | ncall: 9796 | eff(%): 20.120 | loglstar: -inf < -972.974 < inf | logz: -980.418 +/- nan | 2168it [28:18, 7.31it/s, batch: 0 | bound: 6 | nc: 36 | ncall: 10372 | eff(%): 19.233 | loglstar: -inf < -968.066 < inf | logz: -975.588 +/- nan |

Perhaps this is an issue in dealing with my specific problem, but it would be good to understand it if possible. All help very much appreciated.

On Sat, 18 Mar 2023 at 01:02, Sergey Koposov @.***> wrote:

Thanks for the code and the test, now https://github.com/joshspeagle/dynesty/tree/ellipsoid_fix branch should fix it. There was definitely a bug when update_interval was incorrectly set for the dynamic sampling. As the branch contains some refactoring as well, it's not yet final.

— Reply to this email directly, view it on GitHub https://github.com/joshspeagle/dynesty/issues/431#issuecomment-1474529933, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADXITS6DP4AWVYUGV3CQ6J3W4T3QBANCNFSM6AAAAAAV377Y6Y . You are receiving this because you authored the thread.Message ID: @.***>

segasai commented 1 year ago

It is not surprising that as you progress in the posterior, the efficiency goes down as the ellipsoidal approximation becomes less accurate. Keep in mind that the efficiency is a 'cumulative' efficiency. I.e. from the very start. So I'm not seeing evidence of an issue here. Regarding same nc, It is not obvious why it is the case, but not impossible, so again, unless there is evidence of an issue, I'm not sure there is much to do.

ajw278 commented 1 year ago

I understand the decrease in efficiency (although shouldn't be the case after 1 bounds call, because this implies the area being searched is larger than the unit-cube that is originally searched). However, surely it is impossible for the nc to remain fixed forever? This behaviour continues regardless of the number of bounds calls or iterations, which I do not understand. There should at least be some random scatter from sampling.

On Sat, 18 Mar 2023 at 15:00, Sergey Koposov @.***> wrote:

It is not surprising that as you progress in the posterior, the efficiency goes down as the ellipsoidal approximation becomes less accurate. Keep in mind that the efficiency is a 'cumulative' efficiency. I.e. from the very start. So I'm not seeing evidence of an issue here. Regarding same nc, It is not obvious why it is the case, but not impossible, so again, unless there is evidence of an issue, I'm not sure there is much to do.

— Reply to this email directly, view it on GitHub https://github.com/joshspeagle/dynesty/issues/431#issuecomment-1474859366, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADXITS3UXDCGSUCTTXOPXZ3W4W5YJANCNFSM6AAAAAAV377Y6Y . You are receiving this because you authored the thread.Message ID: @.***>

segasai commented 1 year ago

Okay, I was confused - I thought you're using the uniform sampler, but you actually use the automatic sampler which means for this number of dimensions it'll be rwalk . The way rwalk is implemented is that it walks for a fixed number of steps. By default that number is determined by the dimension 20 + ndim. Hence 36.

ajw278 commented 1 year ago

Ah, I see - thanks, I missed this in the manual sorry. Thanks again for your help!

Best, Andrew

On Sun, 19 Mar 2023 at 02:33, Sergey Koposov @.***> wrote:

Okay, I was confused - I thought you're using the uniform sampler, but you actually use the automatic sampler which means for this number of dimensions it'll be rwalk . The way rwalk is implemented is that it walks for a fixed number of steps. By default that number is determined by the dimension 20 + ndim. Hence 36.

— Reply to this email directly, view it on GitHub https://github.com/joshspeagle/dynesty/issues/431#issuecomment-1475062691, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADXITS57BDUHAGQLUSJOBV3W4ZO5PANCNFSM6AAAAAAV377Y6Y . You are receiving this because you authored the thread.Message ID: @.***>

segasai commented 1 year ago

Thanks for the original report Andrew, I will close the issue in the meantime.