Closed CalumGabbutt closed 3 years ago
I have no idea what could be causing such behavior. What do the samples look like prior to this error?
I'm running the sampling on a HPC, so unsuccessful sampling runs aren't saved. However, here is a copy of the output log of the samples, does that help? chain4_O7.o2412976.17.txt
That does help a bit. The run actually looks like it's sampling just fine, except that it's clearly reaching some peak in log-likelihood by the end. There are also some warning messages implying it might be near an edge. This implies that the first parameter might be poorly-behaved prior to failure, but it's hard to say for sure without some printouts of prior parameter values. Is there possibly a weird solution where that first parameter can race off to infinity?
There is a degree of collinearity between the first, second and third parameters (each of them is a rate parameter in a set of differential equations, which is solved using matrix exponentiation along with a known time parameter), could that lead to the odd behaviour? Here is a pickled (using joblib) sample for the same data but with a slightly different model that did finish successfully ( https://drive.google.com/file/d/1U3RcRUkBk3OAPahUKmR1Ebe9fCs4mtuS/view?usp=sharing )
That could be part of the issue. I don't have time today to dig into the pickle file, but I'll try and take a look at some point soon-ish and see if I can give some additional feedback if anything pops out.
Thank you, if you have an explanation of how to save the incomplete samples, I can have a go at generating samples from a run with the above error?
One solution I like is to save the output in batches. If you specify something like maxiter
then you can easily do something like:
for i in range(1, n):
sampler.run_nested(maxiter=i*batch, ...)
res = sampler.results
pickle.dump(res, 'res_{}.pkl'.format(i))
Since the sampler restarts where it left off, this generally should work unless you're doing something weird with the memory. You can be more sophisticated, but that basic structure should work as a place to start.
Here are the samples generated before the sampler crashed (only nlive=200, but the ValueError occurs at a similar logz value) https://drive.google.com/file/d/19IFEMGTiDo_IrAfOUlmyz5xVK5MEBW39/view?usp=sharing
Thanks. I’ll try to take a look over the next few days and get back to you.
A very late follow-up to this, but I believe the recent improvements to the stability and behaviour of the bounding distributions (#219 and others) should now resolve this and other similar issues, so I'm tentatively closing this.
I'm currently using dynesty to perform static nested sampling on a custom likelihood function with 'multi' bounds and 'rwalk' sampling (1500 nlive points). The sampler runs perfectly 99% of the time, however for some of the data samples, the runs fail with the error:
Exception while calling loglikelihood function: params: [ inf 2.86408389e-02 2.22182571e-02 4.04356563e-02 9.23900098e-01 1.17080224e+02 5.64710836e+01 2.27116477e+02 1.33147622e+02 1.88771848e+02 1.46173472e+02 1.30782723e+02 7.62809350e+01 1.34494380e+02 5.20331440e+00 9.39797359e+01 1.34037074e+02 1.10568351e+02 1.17584204e+02 1.15930547e+02 1.23936813e+02 1.29542150e+02 1.02526209e+02 5.47408794e+01 8.79609015e+01 1.21207516e+02 9.86264815e+01 9.85764264e+01 1.25517338e+02 1.77819488e+02 7.18671872e+01 1.05594272e+02 2.60180178e+01 1.91192474e+01 4.99152480e+01 1.79858705e+01 5.50634020e+01 1.46721796e+02 8.48983385e+01 1.74806408e+02 5.72236037e+01 1.71734397e+02] args: [] kwargs: {}
Based on the values of the other runs, the "true" value of params[0] is ~1, so the sampler overflowing is a bit surprising. I know this many parameters is pushing what 'rwalk' can sample effectively, but it does seem to work for the majority of samples. Slice sampling seems to take so long as to be unusable. Also, this error occurs very late in the sample run. Do you have any ideas on how to fix this issue?