johannesulf / nautilus

Neural Network-Boosted Importance Nested Sampling for Bayesian Statistics
https://nautilus-sampler.readthedocs.io
MIT License
73 stars 8 forks source link

Sporadically the sampler hangs and displays 'status: stopped' #55

Open felixvecchi opened 4 weeks ago

felixvecchi commented 4 weeks ago

Hi!

So sporadically (once every few 100 fits) the sampler hangs. I think something goes wrong from the start cause it says status: stopped from the first update onwards. I've included an example output below:

2024-10-24 07:13:47,180 - autofit.non_linear.search.abstract_search - INFO - Starting non-linear search with 4 cores. 2024-10-24 07:13:47,182 - ID=9993811.0 - INFO - The output path of this fit is /srv/beegfs/scratch/users/f/fvecchi/Run/output/LastShell/shear_noise_fixc_zs2.0/ID=9993811.0/8e4319cfc501ed4880c1d196df63ae7b 2024-10-24 07:13:47,183 - ID=9993811.0 - INFO - Outputting pre-fit files (e.g. model.info, visualization). 2024-10-24 07:13:47,866 - ID=9993811.0 - INFO - Starting new Nautilus non-linear search (no previous samples found). Starting the nautilus sampler... Please report issues at github.com/johannesulf/nautilus. Status | Bounds | Ellipses | Networks | Calls | f_live | N_eff | log Z
Stopped | 12 | 1 | 4 | 5000 | 1.0000 | 1 | -2672.41 2024-10-24 07:14:28,277 - ID=9993811.0 - INFO - Fit Running: Updating results after 5000 iterations (see output folder). Starting the nautilus sampler... Please report issues at github.com/johannesulf/nautilus. Status | Bounds | Ellipses | Networks | Calls | f_live | N_eff | log Z
Stopped | 19 | 2 | 4 | 10000 | 1.0000 | 3 | -2563.61 2024-10-24 07:15:38,899 - ID=9993811.0 - INFO - Fit Running: Updating results after 10000 iterations (see output folder). Starting the nautilus sampler... Please report issues at github.com/johannesulf/nautilus. Status | Bounds | Ellipses | Networks | Calls | f_live | N_eff | log Z
Stopped | 22 | 2 | 4 | 15000 | 1.0000 | 1 | -2547.88 2024-10-24 07:16:20,280 - ID=9993811.0 - INFO - Fit Running: Updating results after 15000 iterations (see output folder). Starting the nautilus sampler... Please report issues at github.com/johannesulf/nautilus. Status | Bounds | Ellipses | Networks | Calls | f_live | N_eff | log Z
Stopped | 22 | 2 | 4 | 20000 | 1.0000 | 1 | -2546.40 2024-10-24 07:16:53,677 - ID=9993811.0 - INFO - Fit Running: Updating results after 20000 iterations (see output folder). Starting the nautilus sampler... Please report issues at github.com/johannesulf/nautilus. Status | Bounds | Ellipses | Networks | Calls | f_live | N_eff | log Z
Stopped | 22 | 2 | 4 | 25000 | 1.0000 | 1 | -2546.95 2024-10-24 07:17:27,554 - ID=9993811.0 - INFO - Fit Running: Updating results after 25000 iterations (see output folder). Starting the nautilus sampler... Please report issues at github.com/johannesulf/nautilus. Status | Bounds | Ellipses | Networks | Calls | f_live | N_eff | log Z
Stopped | 22 | 2 | 4 | 30000 | 1.0000 | 1 | -2547.29 2024-10-24 07:18:06,671 - ID=9993811.0 - INFO - Fit Running: Updating results after 30000 iterations (see output folder). Starting the nautilus sampler... Please report issues at github.com/johannesulf/nautilus. Status | Bounds | Ellipses | Networks | Calls | f_live | N_eff | log Z
Stopped | 25 | 2 | 4 | 35000 | 1.0000 | 1 | -2547.34 2024-10-24 07:20:02,116 - ID=9993811.0 - INFO - Fit Running: Updating results after 35000 iterations (see output folder). Starting the nautilus sampler... Please report issues at github.com/johannesulf/nautilus. Status | Bounds | Ellipses | Networks | Calls | f_live | N_eff | log Z
Stopped | 25 | 2 | 4 | 39800 | 1.0000 | 1 | -2547.34 2024-10-24 07:21:26,922 - ID=9993811.0 - INFO - Fit Running: Updating results after 40000 iterations (see output folder). Starting the nautilus sampler... Please report issues at github.com/johannesulf/nautilus. Status | Bounds | Ellipses | Networks | Calls | f_live | N_eff | log Z
Stopped | 26 | 2 | 4 | 44600 | 0.9999 | 1 | -2547.34 2024-10-24 07:24:01,929 - ID=9993811.0 - INFO - Fit Running: Updating results after 45000 iterations (see output folder). Starting the nautilus sampler... Please report issues at github.com/johannesulf/nautilus. Status | Bounds | Ellipses | Networks | Calls | f_live | N_eff | log Z
Stopped | 27 | 1 | 4 | 49100 | 0.9999 | 1 | -2547.34 2024-10-24 07:25:03,029 - ID=9993811.0 - INFO - Fit Running: Updating results after 50000 iterations (see output folder).

johannesulf commented 4 weeks ago

Hi @felixvecchi! Thanks for reporting an issue. Can you maybe describe the issue in a bit more detail? It isn't clear from the output you posted what exactly the issue is.

Generally, the status "stopped" means that the sampler hit the time limit or the maximum number of likelihood calls allowed before being converged, i.e., reaching the desired effective sample size or finishing exploration.

It seems like in the output above, nautilus was re-started repeatedly, each time increasing the maximum number of likelihood calls by 5000. That's a valid use case and should work fine. However, I currently don't see any issue with the output. Can you tell me what part of the output you think should be different?

felixvecchi commented 4 weeks ago

Hi! Well the issue is that it keeps doing this indefinitely. I didn't input a time limit or a maximum number of likelihood calls. So I would like it to either raise an error saying that something went wrong or complete the fit.

johannesulf commented 4 weeks ago

Thanks for getting back so quickly. Can you maybe post the code that leads to the output above?

felixvecchi commented 4 weeks ago

So Im using James Nightingales PyAutoFit and I think that the only relevant line for nautilus is:

search = af.Nautilus( path_prefix=os.path.join("LastShell"), name=id_dir, unique_tag=dataset_name, n_live=150, number_of_cores=4, iterations_per_update=1e7 )

johannesulf commented 4 weeks ago

Thanks! I'm not sure how PyAutoFit interacts with nautilus. Thus, I'm still not sure whether this is an issue at all and, if so, whether this is an issue with nautilus or PyAutoFit. I can look into this a bit and get back to you.

By the way, I would recommend increasing the number of live points (n_live) substantially. I'd use at least 1000. This has to do with the accuracy of the final result.

johannesulf commented 3 weeks ago

@felixvecchi This doesn't seem to be an issue with nautilus, as far as I can tell. PyAutoFit specifies a maximum number of likelihood calls when calling nautilus. That nautilus enters the status stopped is expected since it will hit the maximum number of likelihood calls. You may reach out to the developers of PyAutoFit since they'll know better how nautilus is called from within PyAutoFit.