DynestyDynamic sampler not creating checkpoint file

Jammy2211 commented 1 year ago

Dynesty version 2.0.1

Describe the bug

When I run a model-fit via StaticSampler, the inclusion of a checkpoint file via checkpoint_file=checkpoint_file creates the check point file and I can resume the run via StaticSampler.restore(fname=self.checkpoint_file).

If I swap the StaticSampler out for the DynamicNestedSampler the check point file is not created and resuming does not work.

Everything between the two runs is identical except the sampler.

Setup

DynamicNestedSampler(
                loglikelihood=pool.loglike,
                prior_transform=pool.prior_transform,
                ndim=model.prior_count,
                queue_size=self.number_of_cores,
                pool=pool,
                **self.config_dict_search
            )

            sampler.run_nested(
                 maxcall=iterations,
                 print_progress=not self.silence,
                 checkpoint_file=checkpoint_file,
                 **config_dict_run
             )

Dynesty output

Bug

N/A

Additional context

N/A

segasai commented 1 year ago

Can you give a self-contained example ? and maybe provide the contents of those dictionaries?

I am not sure I can see the problem as the dynamic sampler is explicitly tested here https://github.com/joshspeagle/dynesty/blob/master/tests/test_resume.py

Also the default time between checkpoints is 60 seconds. Are you running your code at least that ?

segasai commented 1 year ago

Assuming no updates, I'll be closing this issue, as I'm not sure there is a bug there.

Jammy2211 commented 1 year ago

The following example does not produce a checkpoint file for the dynamic sampler for me, but does for the static sampler:

import numpy as np

from dynesty.dynesty import NestedSampler
from dynesty.dynesty import DynamicNestedSampler

def fitness_function(model):

    return 100.0 * np.random.random(1)[0]

def prior_transform(cube):
    return cube

sampler = NestedSampler(
    loglikelihood=fitness_function,
    prior_transform=prior_transform,
    ndim=3,
)

sampler.run_nested(
    maxcall=10,
    print_progress=False,
    checkpoint_file="static.savestate",
)

sampler = DynamicNestedSampler(
    loglikelihood=fitness_function,
    prior_transform=prior_transform,
    ndim=3,
)

sampler.run_nested(
    maxcall=10,
    print_progress=False,
    checkpoint_file="dynamic.savestate",
)

Whilst the LH function is somewhat broken, the same behavior is seen for all my science model-fits so the behaviour is not related to how I defined the fitness_function (and checkpointing works for the static sampler anyway).

segasai commented 1 year ago

As I was suspecting before your test does not run long enough for the checkpointing to kick in. With this modification the savefile is created

import numpy as np
import time
from dynesty.dynesty import NestedSampler
from dynesty.dynesty import DynamicNestedSampler

def fitness_function(model):
    print('sleeping')
    time.sleep(.01)
    return 100.0 * np.random.random(1)[0]

def prior_transform(cube):
    return cube

sampler = DynamicNestedSampler(
    loglikelihood=fitness_function,
    prior_transform=prior_transform,
    ndim=3,
)

sampler.run_nested(maxcall=10,
                   print_progress=False,
                   checkpoint_file="dynamic.savestate",
                   checkpoint_every=1)

Jammy2211 commented 1 year ago

Is there any way to make it so that checkpointing does not depend on the clocktime?

Its hard to generalize checkpointing settings to many different modeling problems!

EDIT: Also, why does DynestStatic not suffer this issue if its to do with clocktime?

segasai commented 1 year ago

Regarding the dynamic static showing this issue and not the static, I was just investigating that, and I've found that I always save the checkpoint at the very end no matter the timing. I have addressed that here for the dynamicsampler ef310152cfd920290261c94a38da62ce9e3ec0e4. Regarding the checkpointing that is not timing dependent, in my opinion the time based behaviour was the most useful for cases where running on HPC etc, but I'm happy to hear other suggestions on this.

Jammy2211 commented 1 year ago

If the dynamic sampler now checkpoints at the end of a run, then everything is good!

I agree that any choice of checkpoint frequency (time, samples, accepted samples, etc) will have pros and cons for different use-cases.

Thanks!

segasai commented 1 year ago

I've just released the version (v2.0.2) that includes the fix to the dynamic sampler that forces it to save things in the end of the run. Keep in mind that the checkpointing is not a substitute for persistence. I.e. there are no guarantees on being able to read checkpoint files using different dynesty version from the one used to create the file. We won't break it on purpose, but likely the next major release 2.1 won't be able to deal with 2.0 files.

Jammy2211 commented 1 year ago

Brilliant, thank you!

Keep in mind that the checkpointing is not a substitute for persistence. I.e. there are no guarantees on being able to read checkpoint files using different dynesty version from the one used to create the file. We won't break it on purpose, but likely the next major release 2.1 won't be able to deal with 2.0 files.

I wouldn't expecting that it would! I'm pretty famous for breaking backwards compatibility for my userbase 🤣 .

joshspeagle / dynesty

DynestyDynamic sampler not creating checkpoint file #405