joshspeagle / dynesty

Dynamic Nested Sampling package for computing Bayesian posteriors and evidences
https://dynesty.readthedocs.io/
MIT License
346 stars 76 forks source link

Unable to load previous run during batch stage? #436

Closed ajw278 closed 1 year ago

ajw278 commented 1 year ago

Dynesty version Installed the ellipsoid_fix branch (clone, setup build, setup install)

Your question It seems that when in the batch stage of a dynamic run, the sampler is unable to load previous progress. When I try I get this error:

File "/home/awinter/Documents/dustmapper/dsolver_ns.py", line 231, in execute_NS sampler.run_nested(maxiter=max_n,dlogz_init=20., nlive_init=nlive, nlive_batch=200,checkpoint_file='dmaps/'+filename+'.save',\ File "/home/awinter/anaconda3/lib/python3.10/site-packages/dynesty-2.1.1-py3.10.egg/dynesty/dynamicsampler.py", line 2085, in run_nested File "/home/awinter/anaconda3/lib/python3.10/site-packages/dynesty-2.1.1-py3.10.egg/dynesty/dynamicsampler.py", line 2253, in add_batch File "/home/awinter/anaconda3/lib/python3.10/site-packages/dynesty-2.1.1-py3.10.egg/dynesty/dynamicsampler.py", line 1560, in sample_batch AttributeError: 'DynamicNestedSampler' object has no attribute 'batch_sampler'

To be clear, I am running dynesty in the following way:

pool=multiprocessing.Pool(processes=nprocs, initializer=set_globals, initargs=(dmc_, rstart, rend, distance_, res_cont_, priors, Ngauss))
    pool.size = nprocs
    ndim = len(p0)

...

sampler = dnst.NestedSampler.restore('dmaps/'+filename+'.save', pool=pool)
resume=True
sampler.run_nested(maxiter=max_n,dlogz_init=20., nlive_init=nlive, nlive_batch=200,checkpoint_file='dmaps/'+filename+'.save',\
             resume=resume,  maxiter_init=100000, maxiter_batch=1000, maxbatch=20,  use_stop=True,wt_kwargs={'pfrac': 0.8},stop_kwargs={'pfrac':0.0}, maxcall=int(1e7))

My initial setup for the sampler in a previous run was:

sampler = dnst.DynamicNestedSampler(log_prob, prior_transform, ndim,  bound='balls', sample='rwalk',\
            update_interval=200,first_update={'min_ncall': 5000, 'min_eff': 10.}, pool=pool)

I believe that the reason why this doesn't work is because the 'RunRecord' class in utils does not save batch_sampler. Is there an inherent/practical reason for this, or is this just something that it hasn't been setup to do yet?

Many thanks, Andrew

segasai commented 1 year ago

It seems you are using the wrong restore method "dnst.NestedSampler.restore" instead of dynamic sampler's restore

ajw278 commented 1 year ago

Apologies, silly mistake. Thanks.

segasai commented 1 year ago

actually I'm not sure which restore you use makes a difference, so there is possibly still a bug there...

ajw278 commented 1 year ago

Actually, yes, it seems it doesn't help changing to DynamicNestedSampler.restore, however it is unclear if I already 'corrupted' the initial file by trying to load it with NestedSampler. If this cannot be the case, then it seems there is a bug..

segasai commented 1 year ago

I don't think the file will be corrupted this way.

Do you have a way to reproduce this issue ?

Did you also actively interrupt this run ? or did it stop due to say maxcall or maxiter limit ?

ajw278 commented 1 year ago

In that case I would think it is a bug.

Reproducing is complicated, because my run takes a long time to complete the initial stages. The code is also quite large and dependent on data files, I can provide my code, data and the save file from the previous run privately?

It ended due to the max number of batches being executed. I wanted to try extending the number of batches (from 10-> 20).

ajw278 commented 1 year ago

PS. If you have a test problem with the dynamic run, you could try restarting it and extending the number of batches to see if it does the same thing.

segasai commented 1 year ago

Okay, that's the problem.

the resume can only be used to resume an interrupted run, not to continue a run with different parameters.

I can now even see where the error is coming from. The batch sampling was finished, the batch_sampelr was deleted. There is nothing to resume to.

segasai commented 1 year ago

It is certainly fixable, but the whole save/restore is very tricky because of the number of possible states of the system. So I think there is only a bug that the more helpful error message is needed.

ajw278 commented 1 year ago

I thought it would be something like this - thanks for explaining!

segasai commented 1 year ago

Basically if you want to continue running the sampler (adding more batches) after a successful run, I think you can still .restore and do add_batch(). I have also committed a change 1005fd1d237f305172edc57c603903300ea00585 that will issue a warning and will exit if you try to resume the nested run that is finished,

ajw278 commented 1 year ago

Thanks!