Closed Jammy2211 closed 1 year ago
@Jammy2211 Thanks for raising this issue. Some background on what's happening: when nautilus proposes a bound, it can happen that the newly proposed bound is larger in volume than the previous one. In this case, nautilus simply rejects the new bound and continues adding points to the old one. After some time, it'll try to build a new one again. Generally, bounds being skipped indicates that nautilus is having a hard time figuring out the high-likelihood region. This can be improved by, for example, increasing the number of live points.
So that bounds are skipped is nothing inherently problematic. But calling sampler.posterior()
shouldn't result in any crash or freeze. I currently don't know why this would happen but I'll have a closer look and try the file you sent. Thanks so much for providing that.
Okay, I'll do some tests with a higher n_live
.
The setup works fine for ~90% of datasets, so it must be that for specific datasets its having these issues. Which probably makes sense.
I have increased n_live
from 75 to 450 but still get the skipped behaviour, so anticipate it is something specific about the likelihood function for these cases.
Interesting! It seems like it's only a 5-dimensional problem. So I would have expected 450 live points to work quite well. Still, it can happen that some boundaries are skipped once in a while. Does it happen with the same frequency for 450 as it does for 75?
I also tried reproducing the freeze when calling sampler.posterior()
but was unable to do so. Also, I currently can't imagine why such a freeze would occur at all. Can you produce a freeze working only with the checkpoint file you sent me?
sampler = Sampler(prior, likelihood, 5, n_live=75, filepath='checkpoint.hdf5')
print(sampler.posterior())
I have had the issue crop up on another use case, this time in an N=3 dimensional parameter space. So it could well be that it is specific to lower dimensional problems.
The main problem is that nautilus
seems to never converge when it happens. For the specific fit in question, for most datasets (when this issue doesn't crop up) the fit is completed after ~ 5000 iterations. For datasets where this skipped behavior occurs nautilus
can run for >100000 iterations, even thought everything but the data I'm fitting is the same.
Does it happen with the same frequency for 450 as it does for 75?
It looks like it, and in both cases nautilus
runs for 100000+ iterations as if its indefinitely "stuck".
I also tried reproducing the freeze when calling sampler.posterior() but was unable to do so. Also, I currently can't imagine why such a freeze would occur at all. Can you produce a freeze working only with the checkpoint file you sent me?
The freezes when calling sampler.posterior()
may not be what happens. It was definitely getting stuck, but was difficult to test on the HPC where in the code is occured. I will try and reproduce it, but the crashing behavior could be associated with a slightly different part of the code.
Thanks for the input! I was checking the checkpoint file and I did notice that nautilus is having a hard time figuring out the likelihood. This also has to do with the fact that there are two distinct peaks that nautilus doesn't separate. The issue may be related to that. @Jammy2211 Can you send me the code for the 3-dimensional problem? Maybe I can find some ways to improve this behavior. nautilus shouldn't struggle with 3-dimensional likelihoods.
Apologies for the delay, hectic week.
Could you let me know if you can install the following library: https://pyautocti.readthedocs.io/en/latest/index.html
One dependency is hit or miss if it installs, and if its problematic I'll try come up with a simpler method. If the install is ok I can send a script.
Thanks! I installed the library with some workarounds. The script may or may not work already. I'll most likely be able to figure it out if it doesn't work already.
@Jammy2211 Let me know if you have the script. I'd be happy to work on it.
Once I tried to reproduce this issue, I no longer got the skipped behavior.
It occurs when the data I input into the analysis has defects / issues which mess up the likelihood function. I improved the data processing and the issue went away on this occasion.
For various different models we are still seeing this skipped behavior crop up now and then, with it often leading to nautilus
running for days to complete a fit when it normally takes < 12 hours. I will put an issue up here with one once we have an example that seems appropriate for testing.
For the vast majority of
Nautilus
use-cases, the code runs brilliantly.However, for a small subset, I am running into an issue where the adding of bounds is being skipped:
This leads to extremely long
Nautilus
run time and once the run is complete appears to cause some sort of infinite loop when thesampler.posterior()
function is called.This occurs for a small fraction of input datasets, indicating it is probably something perverse about their likelihood function.
I have uploaded a checkpoint file here: https://drive.google.com/file/d/16LU44iwQ_dckJjfn_5_NycWk6MgPV_GE/view?usp=sharing
Thank you!