joshspeagle / dynesty

Dynamic Nested Sampling package for computing Bayesian posteriors and evidences
https://dynesty.readthedocs.io/
MIT License
357 stars 77 forks source link

Dynesty suddenly stopping #140

Closed jvines closed 4 years ago

jvines commented 5 years ago

Hi @joshspeagle ,

I'm trying to implement dynesty and I'm finding the following problem.... Dynesty apparently freezes for no reason mid run... In the latest run I had, it stopped in this status:

iter: 28277 | bound: 62 | nc: 21 | ncall: 127176 | eff(%): 22.235 | loglstar: -inf < -2688.072 < inf | logz: -2720.777 +/- nan | dlogz: 340.207 > 1.009

Which is weird to me. At no point I get a warning or error message, it just stops there.

I'm currently trying to fit a 5 dim model to data and I tried with the nested sampling and dynamic nested sampling methods.

My call to dynesty is the following:

dsampler = dynesty.NestedSampler( dloglike, dprior, ndim, ptform_args=[empmir.dlogp, logp_params], logl_args=[empmir.logl_rv, logl_params], bound='multi', nlive=1000)

dsampler.run_nested()

Is there any way you could help me with this??

Thank you!

joshspeagle commented 5 years ago

That is quite strange. Would you be able to send me a script or more info so I can troubleshoot this on my end?

jvines commented 5 years ago

Hi Josh, sorry for not coming back to this. I got it to work randomly by tweaking some stuff.... Bottom line is now it works and I'm not sure why. The base code in which I was implementing dynesty is being reworked so it's better... Once that is finished I'll re add dynesty to it and test it again. I'll come back with updates once that happens.

joshspeagle commented 5 years ago

Okay, sounds good. I'll close this for now then.

dianadianadiana commented 5 years ago

Hi @jvines and @joshspeagle,

Just wanted to add that I also have the exact problem of just stopping mid run and freezing. I also have 5 or 6 parameters depending on the data set. I've noticed that if I rerun the dynesty run then sometimes it works and doesn't get stuck/frozen, though I would like to know what causes it to get stuck in the first place, and if there is a way I can catch when it gets stuck so I can automatically rerun it without having to check on it often. By the way, I have the latest version. Let me know if I should share something on my end.

Thanks! Diana

joshspeagle commented 5 years ago

Okay, good to know someone else is having this problem. If possible, it'd be great to get a copy of the run with a fixed random number seed so I can reproduce it. Barring that, if you are able to save any information from the run before it freezes (e.g., live points, the results object, etc.), that will help me debug.

dianadianadiana commented 5 years ago

OK sounds good. I will send an email about this, if that's ok since it might be a bit more complicated to send the material.

But for github purposes, I'm using juliet (https://github.com/nespinoza/juliet) a tool that allows for joint model fits between radial velocities and photometry. The flags that I'm using for dynesty are: dynamic; 1000 live points; bound=multi; sampling=rwalk; nthreads>1 (ranges from 3-10)

and we are using the basic set up of: sampler = dynesty.DynamicNestedSampler(loglike, prior, n_params, nlive=n_live_points, bound = dynesty_bound, sample = dynesty_sample) sampler.run_nested() results = sampler.results

joshspeagle commented 5 years ago

Okay, good to know. Look forward to seeing your email and trying to get to the bottom of this.

jvines commented 5 years ago

Hi @joshspeagle , is there any updates regarding this??

joshspeagle commented 5 years ago

I've had trouble pinning down why exactly this is failing, but I am aiming to return to it today/tomorrow as I prepare for the next dynesty release.

dianadianadiana commented 5 years ago

Maybe with the next release, the problem is somehow solved. I already mentioned this to Josh, but to share with @jvines -- I've noticed that the freezing would always happen at iteration 1800 (or upto 2000). If the run goes past iteration 2000, then it is safe and it runs fine. In case you also saw this pattern.

joshspeagle commented 5 years ago

Okay, I've just pushed a new release to GitHub and will hopefully move it to PyPI if nothing catastrophic is wrong. If you want to update and see if this problem persists, that'd be great. If it does, something similar to the last files/logs you sent me @dianadianadiana should be helpful to further pinpoint the problem.

jvines commented 5 years ago

Awesome. I'll try and test this week if it's solved, if not I'll let you know.

jvines commented 5 years ago

Hi @joshspeagle

I've tested the new build and, while it doesn't get stuck anymore (not for huge periods of time, at least), it throws an error about the input precision matrix defining an ellipsoid being singular.

This happened using the dynamic nested sampler (with the regular nested sampler it still gets stuck for long periods of time, > 1 hr).

This happened aroung iteration 6600 and efficiency 2.7%..

Do you have any insight as to why this could be happening??

Thanks.

jvines commented 5 years ago

I'm using the code hosted at

https://github.com/macapujol/astroEMPEROR

In particular, ais.py

Thank you!

joshspeagle commented 5 years ago

Okay, that's good to know. That implies that the issue definitely is constructing bounding ellipsoids, so the distribution of live points must be causing problems. If you can freeze a snapshot of exactly when this happens (or send me a copy of the results dictionary from the sampler), I might be able to do something depending on whether it's a result of numerical precision (one dimension is super small), large correlations (2+ dimensions are essentially co-planar), etc.

joshspeagle commented 5 years ago

Just for reference, this error is thrown here because the condition

l, v = lalg.eigh(self.cov) if np.all((l > 0.) & (np.isfinite(l))):

fails. However, I specifically have a check here that tries to resolve this issue by mixing in the identity matrix, which explicitly checks this exact same condition before accepting the matrix. The fact that this fails is somewhat worrisome, so some type of traceback (even just copy-pasting the error) would be helpful.

jvines commented 5 years ago

Hi @joshspeagle , I wanted to ask you, is it normal that the last N samples are all the same after a run??? Where N can be a large number like 1000 or even more.

joshspeagle commented 5 years ago

Can you show me an example of this? It is decidedly not normal as default behavior and I haven't encountered this before.

jvines commented 5 years ago

I can send you the dynesty output to your email. Should I send it to the one listed on your profile?

joshspeagle commented 4 years ago

Copying it to the issue page works for me, but you can certainly also reach me at jspeagle@cfa.harvard.edu if you want to send me more details and discuss things there.

MrBudgens commented 4 years ago

I am having a similar problem to that described at the top of this thread. My model has 5 or 6 parameters, and a dynamic sampler run with default settings typically finishes within an hour and produces sensible posteriors. However, sometimes the output ceases to update, and the run fails to finish even when left overnight.

I got my logprob function to count and report the number of calls to it, and discovered that it continues to run after dynesty has ceased to produce output. This happens even when I limit the number of iterations -- for example setting (maxiter_init=10000, maxiter_batch=4000, maxbatch=10, use_stop=False) resulted in two similar runs finishing within an hour, while another two continued overnight and my logprob function was reporting at least 150,000 iterations for each by the time I killed them.

I'm only just learning how to use dynesty, so I can't rule out some mistake on my part, but it seemed helpful to report that the sampler appears to be exceeding the iteration limits by a large margin.

joshspeagle commented 4 years ago

No, it's almost definitely not you. Tracking down this bug and the ellipsoid bug has been time-consuming and difficult, but I'm hopeful that I'll be able to finally resolve those soon. Thanks for re-bumping this so I know I should try and focus on a fix sooner rather than later. If you want to send me any additional files to help troubleshoot that'd be great but not necessary thanks to contributions/files from other patient users.

JonasKemmer commented 4 years ago

Hey, thanks for you effort in finding the cause for this bug. I am using juliet, which Diana mentions, as well and have the same issues like her. It's still often getting stuck at ~1800 iterations and isn't able to finish anymore. Restarting sometimes helps.

joshspeagle commented 4 years ago

I'm absolutely slammed right now but plan to push through some changes this upcoming weekend (probably Sunday) that should at least take a stab at this problem. I have located (and hopefully my bugfix actually fixes) a decomposition bug that sometimes can lead to this behavior, and am optimistic that I will be able to track down any other locations in the code this endless function proposal could be occurring. If you could send me an email with (1) a quick traceback (just canceling if/when it gets stuck) and/or (2) the configuration you're running with, that'd be great.