joshspeagle / dynesty

Dynamic Nested Sampling package for computing Bayesian posteriors and evidences
https://dynesty.readthedocs.io/
MIT License
357 stars 77 forks source link

Dynamic Nested Sampler not generating points #208

Closed Bamash closed 3 years ago

Bamash commented 4 years ago

Hi, I've been using dynesty's Dynamic Nested Sampler for an inference problem I'm solving. I was using it on a 7-D problem without any issues and have now moved on to a 24-D problem. What I'm now finding is that the sampler appears to get stuck, in the sense that it doesn't generate any new points for days on end, which results in the progressbar not changing. This appears to be the case when I don't specify a value of "maxiter" or if the value is greater than about 2000. I'm relatively new to using dynesty, so I'm not sure how to debug this. Do you have any advice? I've been using all the default settings which leads me to think that's the issue.

joshspeagle commented 4 years ago

Would you be able to provide any more information on the issue? Approximately when/where is it getting stuck, and what is behavior up until that point? Any copy-pastes of relevant outputs (progressbar, print statements, etc.) would also be super helpful. Best would be if you can interrupt the run and pickle the sampler, which you might be able to send my way to debug in detail (provided I can reproduce your likelihood and environment).

One issue that I would raise immediately is that, when dealing with such a high-dimensional problem, you almost certainly need to increase the number of live points from the default values to guarantee stable behavior. The rule of thumb I use is N^2 * a few, which here is probably ~1-2k rather than the much smaller defaults.

Bamash commented 4 years ago

Ok, thanks for the advice! I'm now got a few runs going with live points in the thousands to see if that helps. I'll update you on what the result is. If that doesn't help, I'll send the pickled runs over!

joshspeagle commented 4 years ago

Great. I would also confirm that you're using either the latest public version or the latest dev version, since they do contain some fixes that help to prevent runs hanging from bad ellipsoid decompositions (assuming you're using multi).

Bamash commented 4 years ago

I’ve double-checked to make sure that I’ve got the latest version!

I had four runs going earlier, each with a different number of live points: 1000, 2000, 5000 and 10,000. For the second and third runs, I am getting the following error:

 1843it [1:06:47,  2.17s/it, batch: 0 | bound: 2 | nc: 637 | ncall: 35471 | eff(%):  5.124 | loglstar:   -inf < -14.113 <    inf | logz: -17.017 +/-  0.081 | dlogz:  4.915 >  0.010]   

  File "/Users/johannesheyl/Dropbox/PhD/BAMBI/Sulphur_expanded_network/BAMBI_trial.py", line 103, in generate_new_nested_sampling_data # This command is from my code!
    self.sampler.run_nested()

  File "/opt/anaconda3/lib/python3.8/site-packages/dynesty/dynamicsampler.py", line 1619, in run_nested
    for results in self.sample_initial(nlive=nlive_init,

  File "/opt/anaconda3/lib/python3.8/site-packages/dynesty/dynamicsampler.py", line 838, in sample_initial
    for it, results in enumerate(self.sampler.sample(maxiter=maxiter,

  File "/opt/anaconda3/lib/python3.8/site-packages/dynesty/sampler.py", line 782, in sample
    u, v, logl, nc = self._new_point(loglstar_new, logvol)

  File "/opt/anaconda3/lib/python3.8/site-packages/dynesty/sampler.py", line 380, in _new_point
    u, v, logl, nc, blob = self._get_point_value(loglstar)

  File "/opt/anaconda3/lib/python3.8/site-packages/dynesty/sampler.py", line 364, in _get_point_value
    self._fill_queue(loglstar)

  File "/opt/anaconda3/lib/python3.8/site-packages/dynesty/sampler.py", line 353, in _fill_queue
    self.queue = list(self.M(evolve_point, args))

  File "/opt/anaconda3/lib/python3.8/site-packages/dynesty/sampling.py", line 564, in sample_slice
    raise RuntimeError("Slice sampler has failed to find "

RuntimeError: Slice sampler has failed to find a valid point. Some useful output quantities:
u: [0.46118211 0.93330249 0.51512133 0.13024603 0.41321489 0.96080873
 0.10697359 0.13687092 0.49005103 0.75583209 0.75799675 0.74136701
 0.89167272 0.21786081 0.86100013 0.70452172 0.60999148 0.06585299
 0.22751916 0.13338184 0.81943203 0.51146267 0.31995703 0.46872475]
u_left: [0.46118211 0.93330249 0.51512133 0.13024603 0.41321489 0.96080873
 0.10697359 0.13687092 0.49005103 0.75583209 0.75799675 0.74136701
 0.89167272 0.21786081 0.86100013 0.70452172 0.60999148 0.06585299
 0.22751916 0.13338184 0.81943203 0.51146267 0.31995703 0.46872475]
u_right: [0.46118211 0.93330249 0.51512133 0.13024603 0.41321489 0.96080873
 0.10697359 0.13687092 0.49005103 0.75583209 0.75799675 0.74136701
 0.89167272 0.21786081 0.86100013 0.70452172 0.60999148 0.06585299
 0.22751916 0.13338184 0.81943203 0.51146267 0.31995703 0.46872475]
u_hat: [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
u_prop: [0.46118211 0.93330249 0.51512133 0.13024603 0.41321489 0.96080873
 0.10697359 0.13687092 0.49005103 0.75583209 0.75799675 0.74136701
 0.89167272 0.21786081 0.86100013 0.70452172 0.60999148 0.06585299
 0.22751916 0.13338184 0.81943203 0.51146267 0.31995703 0.46872475]
loglstar: -13.94783111860903
logl_prop: -15.19156364889188
axes: [[ 3.56101815e-03 -8.67697267e-03  1.11345572e-02 -1.63188364e-02
  -1.60944448e-02  2.47935891e-03  7.30276749e-03  4.43905974e-04
   6.20139046e-03 -1.19641517e-02  2.13307722e-03 -8.79254504e-03
  -2.72641634e-04 -4.56665718e-03 -7.81919170e-04 -1.61476361e-02
   4.11441859e-03 -1.06278369e-02 -2.14680532e-02  1.49856758e-04
   1.52854625e-02 -1.33164771e-02 -1.60028996e-02 -3.07858570e-03]
 [-1.77219066e-03  1.26824261e-02  2.46084047e-03 -5.84958501e-03
  -5.70193064e-03 -3.32886215e-03 -5.36215426e-03  6.43230910e-03
   2.40538214e-02  1.56734280e-02  2.43393500e-03  2.37389070e-03
   3.16641800e-03 -9.55339748e-03  4.30282038e-04  1.02844711e-02
   4.26328131e-03  1.43761168e-02 -2.78536219e-02  1.37778967e-02
  -2.33231125e-02  2.28094107e-03 -3.30849242e-03  2.23760792e-03]
 [-1.11313453e-02  6.47900393e-03  1.94222883e-02 -6.75361925e-03
   2.76129045e-03 -1.17705906e-02  3.80074091e-03 -4.79617854e-03
   6.65188746e-03  1.13955193e-02 -8.66929276e-03  8.06330481e-03
   1.29947186e-03  1.51281657e-02 -3.21376675e-03  1.83136382e-02
   1.44689748e-02 -2.38086746e-03 -7.38368781e-03 -1.24099887e-02
   2.75001253e-02  1.13821382e-02  6.51020891e-03  1.35057601e-02]
 [-1.51344144e-02 -2.51987249e-03  3.51525083e-03  1.15412538e-02
  -1.95301705e-02  1.91982639e-02  2.81040920e-03  1.58659071e-03
   1.39425663e-02  3.47256571e-03 -1.92005482e-03 -1.47755778e-02
  -9.57505413e-03 -9.95419939e-03 -7.29140335e-04  1.77356815e-02
   2.19820217e-02 -2.98186064e-02  1.61357512e-02 -9.14138054e-03
  -1.38219173e-02  6.09015469e-03 -4.66719786e-03 -7.53462862e-03]
 [-2.56558741e-02 -8.84740268e-03 -1.40939729e-03  8.09739873e-04
  -2.44956867e-02  3.69736907e-03 -3.39985640e-02 -1.02206099e-02
   1.40701130e-02 -2.00240180e-02  1.87331872e-02  4.01356919e-03
   3.59751422e-03  2.56846967e-02  1.77749934e-02  7.76541927e-03
  -1.45335762e-02  1.80594515e-02  7.33383186e-03  1.08935151e-02
   1.24808632e-02  1.71513584e-02 -1.32697197e-02 -1.68214815e-02]
 [-1.00250631e-02  2.27347283e-02 -2.68616621e-02 -9.93169069e-03
   2.35983973e-03  3.30776866e-02 -2.42129366e-03  8.73800287e-03
  -1.00453847e-02 -1.15014332e-02  1.11167606e-02 -3.54068738e-03
  -1.85691108e-02 -8.53748003e-03  1.52517220e-02  7.58694840e-03
  -1.34158570e-02 -5.66005134e-03 -3.36325468e-02 -2.37527817e-02
   1.38525533e-02  8.89253879e-03  3.29035990e-02 -3.18255469e-03]
 [-1.54210707e-02  1.26437325e-02 -1.14011400e-02 -1.38039706e-03
  -9.95920024e-03 -2.28809491e-02 -1.12498183e-02  1.21199419e-02
  -1.74384641e-02  1.23628042e-02 -2.32959547e-02 -1.65773336e-02
  -3.72813874e-02 -1.61859633e-02 -2.20016252e-02 -1.93011577e-02
   1.02945927e-02  2.87954159e-02  1.28499208e-03 -1.66835510e-02
   8.90770891e-03  2.13293965e-02 -2.23776077e-02 -2.36563902e-02]
 [ 4.77868590e-02  1.68218080e-02  1.34826662e-02  1.89694995e-02
  -3.79449288e-03  2.37927412e-02  1.39306613e-03  3.97628260e-02
  -3.33937668e-03  6.54926732e-04  1.87970252e-04 -2.42096623e-02
   2.11754807e-02  1.66440379e-02 -9.43739813e-03  1.20010514e-02
  -8.49791072e-04  6.40523437e-03 -7.29288955e-04  1.11138595e-02
   2.00285786e-02  2.88143396e-02 -9.69517468e-03 -1.54193864e-02]
 [-3.08237827e-02  7.20511416e-03 -2.40466075e-02  1.44561169e-02
  -1.54579629e-02  7.24683173e-03 -9.57506285e-04  2.54676753e-02
  -3.26675344e-03 -9.80637252e-03 -3.71320703e-02  1.43425455e-02
   6.87800888e-03  1.71092047e-02 -3.64486277e-02  1.84436748e-03
  -5.20776491e-03 -1.33824502e-02 -7.36936774e-03  4.05092318e-02
   1.18667889e-02 -2.44176672e-02  1.12398830e-02  5.63434480e-03]
 [-1.09831557e-02 -1.20705441e-02  2.22484784e-04  2.71053569e-03
   4.63877953e-03  9.60548461e-03  1.22158698e-02  2.24430801e-02
  -2.02628202e-03 -1.56150945e-02 -4.19509465e-03 -5.75342086e-03
   1.02965733e-02  2.40332050e-02 -1.20266483e-02  3.03588108e-02
  -9.34081704e-03  3.18877285e-02 -5.12584773e-03 -5.22467899e-02
  -2.08409069e-02 -3.75787167e-02 -2.56754272e-02  4.00573957e-03]
 [-3.04584244e-02 -1.76642403e-02  3.24928819e-03 -5.36676882e-03
   2.34850909e-02  3.93353511e-02  2.59299126e-02  2.33987636e-02
  -5.79148484e-03  3.61507952e-02  3.75628349e-02  9.46551076e-03
  -8.95169033e-03 -9.24188750e-03  9.23837613e-03 -7.19868771e-03
   2.75257850e-02  3.21439012e-02  1.56273363e-02  2.68034909e-02
   2.70613646e-02 -1.73329479e-02 -7.75433368e-03 -6.91664156e-03]
 [-2.05037592e-02  3.19680251e-02  2.27892818e-02  5.48240148e-03
   8.54726860e-03  5.70769394e-03 -8.70639094e-03  2.66116577e-03
  -6.46835797e-03 -3.68485591e-02  4.03001547e-03 -5.25230464e-02
  -1.12678501e-02 -5.32079750e-03  9.41801138e-03 -1.23138067e-02
   8.40544498e-03  1.90851660e-02  1.37519092e-02  1.90931161e-02
  -5.75773256e-03 -1.47256721e-04 -3.00084763e-03  7.12761916e-02]
 [ 1.79763127e-02  8.16355179e-03  2.52530113e-02  2.81088595e-02
   1.75587235e-02  1.09708696e-02 -5.64787084e-02 -3.30250996e-03
   2.20116407e-02 -7.51907809e-03 -2.69529922e-02  1.28902453e-02
  -6.30618495e-03 -5.35388775e-02  1.28959088e-02  1.86208118e-02
  -1.09213878e-03  1.15995531e-02  9.30047180e-03 -6.33901148e-03
   3.60949740e-02 -4.93365118e-02  3.59910268e-03 -1.29419609e-02]
 [-1.67590120e-02 -4.29579134e-02  5.30797302e-02 -3.18156423e-02
   3.97497763e-02  8.36287563e-03 -2.39419717e-02  3.40212995e-02
   2.01623583e-02 -1.73835315e-02 -4.61349011e-05 -7.76940338e-03
  -1.84560815e-02  9.77913036e-05 -4.31878777e-02 -1.27284936e-02
  -2.43458561e-02 -8.39442854e-03  1.96966272e-03  4.15425003e-04
  -2.20465072e-02  2.30292573e-02  4.03903660e-02 -1.73972245e-02]
 [ 1.82702350e-02 -5.50908682e-02  7.47826489e-03  3.01290049e-02
  -1.12846138e-02  2.65555613e-02 -3.31325760e-02 -3.66096468e-03
  -3.96818220e-02 -7.92633260e-03 -2.74871806e-02  2.13450367e-02
  -1.15882882e-02  2.17113470e-02  2.05777324e-02 -1.65681570e-02
   6.42503928e-02  1.85696954e-02 -4.16021050e-02 -8.44956012e-04
  -2.35732216e-02  1.39101919e-02  1.57985940e-02  1.88573456e-02]
 [ 7.04319450e-03  1.43993478e-02  2.56576561e-02 -1.49093270e-03
   1.43229394e-02 -4.35418770e-02  2.22376524e-02  2.58674206e-02
  -3.91776083e-02 -4.66092087e-02  1.65183410e-02  2.19187061e-02
  -5.81729812e-02  6.13705522e-03  3.03041535e-02  5.55342625e-02
   1.26875113e-02 -1.75442509e-02 -1.01553353e-02  3.95087258e-02
  -1.34503613e-02 -1.24420319e-02 -1.31019794e-02 -3.16697057e-02]
 [-3.76789760e-02 -1.25500350e-02  2.37026923e-02 -3.01938840e-02
   3.96966792e-03  3.51924353e-03 -2.09615457e-02 -6.11469288e-03
  -7.65975750e-02  3.50991797e-02 -3.01946217e-02 -3.87018807e-02
   5.83049902e-02 -2.69359592e-02  2.91374380e-02  3.08478621e-02
  -2.63066888e-02 -1.64514447e-02 -1.57421813e-02  1.53359264e-02
  -8.03592302e-03  2.54145529e-03 -1.34522294e-02 -1.52617624e-02]
 [-3.03836058e-02 -2.67482886e-02  1.67796979e-03  8.25158638e-02
   1.48896244e-02 -4.23725369e-02 -7.31702279e-03  5.52373085e-02
   6.31035719e-03  2.17773202e-02  3.32037240e-02  5.98132636e-03
   3.18567147e-03 -1.45433724e-02  2.50271451e-02 -2.54665584e-02
  -3.56422060e-02 -4.08491332e-02 -3.43962886e-02 -2.07876144e-02
   1.22996656e-02  1.52357611e-02 -2.51898134e-02  2.84078166e-02]
 [-7.14685501e-03 -5.38771798e-02 -4.49624833e-02  1.74060869e-02
   1.41406605e-02 -5.05242345e-02  3.47089053e-02 -1.79330572e-03
   3.21977214e-02 -3.56917599e-02 -8.02102048e-03 -6.59615802e-02
   3.29339387e-02 -2.66077367e-02  1.33503461e-02  2.76214080e-02
   3.20599121e-02  3.16877583e-02 -1.12233437e-02  1.25810032e-02
   2.41458203e-02  6.01892160e-03  5.04142590e-02 -2.50376179e-02]
 [ 8.34314358e-03 -2.86246784e-02 -5.48003075e-02 -3.70703269e-02
   4.98429833e-02  1.92997283e-02 -1.42044421e-02  4.16621647e-03
   8.20932594e-03 -3.99612809e-02 -9.08713680e-03  3.73781012e-02
   1.24016415e-02 -4.51390788e-02 -2.40810785e-02  3.60888701e-02
   1.27708971e-02 -1.66109719e-02 -1.65873329e-04  7.73536184e-03
   1.35547788e-02  5.48303799e-02 -7.37382681e-02  4.13543093e-02]
 [ 7.12464407e-03 -5.06134384e-04 -2.10790502e-02  2.73044080e-02
   5.66122693e-02  3.68832446e-03 -4.11579848e-02 -6.45356111e-02
  -8.75653441e-03  2.65749890e-02  4.54533809e-02 -5.65860814e-02
  -2.63015182e-02  4.28627490e-02 -5.97409646e-02  2.03283294e-02
   1.13468199e-02 -2.69212611e-02 -4.19929410e-02  1.70210605e-02
   6.66388574e-03 -2.85915378e-02 -2.27957177e-02 -1.47299112e-02]
 [-4.81211230e-04  2.30701403e-02 -5.64526044e-03 -1.97348027e-02
  -3.80268834e-02 -4.00081983e-02 -5.33851783e-02  3.50702939e-02
  -4.28864393e-02 -1.82490019e-02  8.81339982e-02  2.11445964e-02
   5.70617353e-02 -3.08202267e-02 -5.50970203e-02  1.86065556e-03
   6.01884576e-02 -7.37200255e-03  1.41836912e-02 -2.41756514e-02
  -1.70018934e-03 -1.79832431e-02  3.16898745e-02  4.13948056e-03]
 [ 4.65432587e-02 -4.81643555e-02 -4.53014468e-02 -5.92962447e-02
  -3.46438001e-02 -2.48009254e-02 -4.69728867e-02  5.30477475e-02
   8.46300730e-03  5.69171018e-02 -5.45710246e-04 -3.51412808e-02
  -5.18526682e-02  2.61045423e-02  2.91482368e-02  3.18469641e-02
  -1.62372762e-02 -1.51555500e-02  2.99809927e-02  8.01445617e-03
   1.46667215e-02 -3.19844037e-02  1.26804124e-02  5.18831317e-02]
 [ 1.82085230e-02 -4.93061649e-02  2.96750956e-02  3.66723628e-02
  -7.62198440e-02  1.41475565e-02  3.91589196e-02 -4.53517492e-02
  -3.44931632e-02  7.48600695e-03  3.14929247e-02  1.09455215e-02
  -3.69358620e-02 -5.81025988e-02 -6.63619737e-02  5.64007082e-02
  -6.49407321e-02  4.01399995e-02 -1.17912591e-02  1.60901229e-02
   1.70042165e-02  1.89202840e-02  1.73095391e-02  4.19047238e-02]]
axlens: [0.051088414928168936, 0.05582701892511675, 0.05654231073042236, 0.06402078520435654, 0.07903729714208958, 0.08347319148492452, 0.08954608284677146, 0.09250350299297536, 0.09428750981560978, 0.0973182058035659, 0.10786908166963154, 0.11340849119917165, 0.12033176635704317, 0.1268510381806009, 0.1341164624149319, 0.1403926654945687, 0.144639289704598, 0.1519876189999924, 0.1574883553782208, 0.1624694552288093, 0.165485613639564, 0.17754645624979135, 0.1794929525013046, 0.19544738714377044]
s: 0.0.

The arrays “u“, “u_left“, “u_right“ and “u_prop” appear to be identical and “u_hat” is an array of zeros. This then leads to the quantity “s” equaling zero.

Strangely enough, the runs with 1000 and 10,000 live points are still running!

Bamash commented 4 years ago

I've also noticed that the runs get stuck after about 1800-1900 iterations.

Bamash commented 4 years ago

Hi Joshua, I was wondering if you had any thoughts on the above issue :)

joshspeagle commented 4 years ago

Sorry for the late response -- totally missed this notification in my inbox. Thanks for bumping it.

The error message is indicating that the given slice that is being proposed hasn't found a single likelihood that satisfies the hard likelihood constraint. Why this is happening is a mystery to me since the actual axes seem okay (i.e. the ellipsoid it is proposing from is well-defined) and, by construction, the starting position is one of the live points. Slice sampling works by "stepping out" an interval along a given direction until the "left" and "right" sides have likelihood(L, R) < likelihood_star, and then shrinks until it is larger. The fact that the starting position appears to have a likelihood < likelihood_star should never happen unless there are pathologies in the likelihood or some crazy bug in the code has occurred which I just haven't tracked down yet.

You can see the sampling routine here. If you spot anything obvious please let me know, but judging from the error message I'm actually quite confused.

How did the other runs proceed, by the way?

Bamash commented 4 years ago

I can't spot anything obvious in the code. To see if it was maybe an issue with the likelihood itself, I ran it with the pyMultiNest package. I didn't encounter any issues there.

Should I maybe try switching the sampling method to "rslice"? Based on what I've read in the documentation, that seems to be an alternative and is based on the PolyChord algorithm.

Unfortunately, the other runs gave me the same error message, though this happened quite a while after the other two.

joshspeagle commented 4 years ago

Yes, try rslice (also change slices to be larger, to match the numbers for rwalk) and see if that maybe fixes things. If so, that suggests the principle axes might have some issues that I'm not catching, and that I should switch the default sampling method to compensate while I look into the problem in additional detail. Sorry about all the trouble.

joshspeagle commented 4 years ago

rwalk might also be an option, if you set walks to be like 50 or 100. Failures in any/all of these cases would give some information on what could be going on.