Unable to resume run following pickle / dill

Jammy2211 commented 4 years ago

I'm on dynesty Version: 1.0.1

Following advise in other issues, I am attempting to create resume functionality by pickling the nested_sampler:

dynesty_sampler = NestedSampler(loglikelihood=fitness, prior_transform=prior,ndim=model.prior_count)

dynesty_sampler.run_nested(maxcall=200)

with open("{}/{}.dill".format(path, "nls"), "wb") as f:
     dill.dump(dynesty_sampler, f)

This works fine, in the sense that the instance is output to hard disk (using either pickle or dill).

The problem is when I load the pickle and attempt to continue sampling:

with open("{}/{}.dill".format(path, "nls"), "rb") as f:
    dynesty_sampler = dill.load(f)`

dynesty_sampler.run_nested(maxcall=200)

This gives the following error:

File "/home/jammy/PycharmProjects/VirtualEnvs/PyAuto/lib/python3.6/site-packages/dynesty/sampler.py", line 928, in run_nested add_live=add_live)): File "/home/jammy/PycharmProjects/VirtualEnvs/PyAuto/lib/python3.6/site-packages/dynesty/sampler.py", line 782, in sample u, v, logl, nc = self._new_point(loglstar_new, logvol) File "/home/jammy/PycharmProjects/VirtualEnvs/PyAuto/lib/python3.6/site-packages/dynesty/sampler.py", line 380, in _new_point u, v, logl, nc, blob = self._get_point_value(loglstar) File "/home/jammy/PycharmProjects/VirtualEnvs/PyAuto/lib/python3.6/site-packages/dynesty/sampler.py", line 364, in _get_point_value self._fill_queue(loglstar) File "/home/jammy/PycharmProjects/VirtualEnvs/PyAuto/lib/python3.6/site-packages/dynesty/sampler.py", line 337, in _fill_queue point = self.rstate.rand(self.npdim) AttributeError: 'MultiEllipsoidSampler' object has no attribute 'rstate'

Any help appreciated!

joshspeagle commented 4 years ago

This issue is the result of problems pickling modules like RandomState from np.random. To resolve these issues, in #130 and #165 several modules are explicitly removed, including self.rstate, self.pool, and self.M. Re-defining those via, e.g.,

dynesty_sampler.rstate = np.random dynesty_sampler.pool = pool dynesty_sampler.M = pool.map

should hopefully do the trick.

Jammy2211 commented 4 years ago

Does the job, thanks!

Jammy2211 commented 4 years ago

This trick isn't working for the DynamicSampler, which for pickle gives me the error:

Traceback (most recent call last):
  File "/home/jammy/PycharmProjects/PyAuto/PyAutoToy/gaussian/workspace/runners/x1_gaussian.py", line 87, in <module>
    pipeline.run(dataset=imaging, mask=mask)
  File "/home/jammy/PycharmProjects/PyAuto/PyAutoToy/gaussian/src/pipeline/pipeline.py", line 19, in run
    return self.run_function(runner)
  File "/home/jammy/PycharmProjects/PyAuto/PyAutoFit/autofit/tools/pipeline.py", line 198, in run_function
    results.add(name, func(phase, results))
  File "/home/jammy/PycharmProjects/PyAuto/PyAutoToy/gaussian/src/pipeline/pipeline.py", line 17, in runner
    return phase.run(dataset=dataset, results=results, mask=mask)
  File "/home/jammy/PycharmProjects/PyAuto/PyAutoToy/gaussian/src/pipeline/phase/dataset/phase.py", line 61, in run
    result = self.run_analysis(analysis)
  File "/home/jammy/PycharmProjects/PyAuto/PyAutoFit/autofit/tools/phase.py", line 136, in run_analysis
    return self.optimizer.fit(analysis=analysis, model=self.model)
  File "/home/jammy/PycharmProjects/PyAuto/PyAutoFit/autofit/optimize/non_linear/non_linear.py", line 123, in fit
    model
  File "/home/jammy/PycharmProjects/PyAuto/PyAutoFit/autofit/optimize/non_linear/nested_sampling/nested_sampler.py", line 91, in _fit
    fitness_function.__call__
  File "/home/jammy/PycharmProjects/PyAuto/PyAutoFit/autofit/optimize/non_linear/nested_sampling/dynesty.py", line 172, in _simple_fit
    pickle.dump(dynesty_sampler, f)
TypeError: can't pickle module objects

And for dill:

Traceback (most recent call last):
  File "/home/jammy/PycharmProjects/PyAuto/PyAutoToy/gaussian/workspace/runners/x1_gaussian.py", line 87, in <module>
    pipeline.run(dataset=imaging, mask=mask)
  File "/home/jammy/PycharmProjects/PyAuto/PyAutoToy/gaussian/src/pipeline/pipeline.py", line 19, in run
    return self.run_function(runner)
  File "/home/jammy/PycharmProjects/PyAuto/PyAutoFit/autofit/tools/pipeline.py", line 198, in run_function
    results.add(name, func(phase, results))
  File "/home/jammy/PycharmProjects/PyAuto/PyAutoToy/gaussian/src/pipeline/pipeline.py", line 17, in runner
    return phase.run(dataset=dataset, results=results, mask=mask)
  File "/home/jammy/PycharmProjects/PyAuto/PyAutoToy/gaussian/src/pipeline/phase/dataset/phase.py", line 61, in run
    result = self.run_analysis(analysis)
  File "/home/jammy/PycharmProjects/PyAuto/PyAutoFit/autofit/tools/phase.py", line 136, in run_analysis
    return self.optimizer.fit(analysis=analysis, model=self.model)
  File "/home/jammy/PycharmProjects/PyAuto/PyAutoFit/autofit/optimize/non_linear/non_linear.py", line 123, in fit
    model
  File "/home/jammy/PycharmProjects/PyAuto/PyAutoFit/autofit/optimize/non_linear/nested_sampling/nested_sampler.py", line 91, in _fit
    fitness_function.__call__
  File "/home/jammy/PycharmProjects/PyAuto/PyAutoFit/autofit/optimize/non_linear/nested_sampling/dynesty.py", line 172, in _simple_fit
    dill.dump(dynesty_sampler, f)
  File "/home/jammy/PycharmProjects/VirtualEnvs/PyAuto/lib/python3.6/site-packages/dill/_dill.py", line 286, in dump
    pik.dump(obj)
  File "/usr/lib/python3.6/pickle.py", line 409, in dump
    self.save(obj)
  File "/usr/lib/python3.6/pickle.py", line 521, in save
    self.save_reduce(obj=obj, *rv)
  File "/usr/lib/python3.6/pickle.py", line 634, in save_reduce
    save(state)
  File "/usr/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/jammy/PycharmProjects/VirtualEnvs/PyAuto/lib/python3.6/site-packages/dill/_dill.py", line 893, in save_module_dict
    StockPickler.save_dict(pickler, obj)
  File "/usr/lib/python3.6/pickle.py", line 821, in save_dict
    self._batch_setitems(obj.items())
  File "/usr/lib/python3.6/pickle.py", line 847, in _batch_setitems
    save(v)
  File "/usr/lib/python3.6/pickle.py", line 496, in save
    rv = reduce(self.proto)
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 528, in __reduce__
    'pool objects cannot be passed between processes or pickled'
NotImplementedError: pool objects cannot be passed between processes or pickled

Does this require an update on the getattr method (or some other magic method) for the Dynamic sampler?

joshspeagle commented 4 years ago

Yes, I'd have to add in a similar method to the dynamicsampler object as exists for the regular sampler object. I can try to add this in sometime soon-ish, depending on how soon you need this working. The internals are a little bit more of a pain to work with, but re-initialization should be straightforward.

Jammy2211 commented 4 years ago

Great - there's no rush, the project I'm implementing Dynesty in we can pretty much interchange the two samplers, so I can get everything set up everything using the Static sampler and update dynesty whenever theres a new release :).

joshspeagle commented 4 years ago

Okay, sounds good. Whenever this becomes more timely, just ping me here or over email just to let me know I should add this in sooner rather than later.

3fon3fonov commented 4 years ago

I will highly appreciate this fix. I am unable to pickle/dill dynesty sampler.

joshspeagle commented 4 years ago

Okay, I'll move this up on my to-do list and also plan to add in additional information into the documentation.

Jammy2211 commented 4 years ago

We've just hit a barrier where we are unable to pickle the Dynesty instance cause it is > 4 GB, which is the serialization limit of pickling. This is because the log likelihood function we are passing to Dynesty contain the data we are fitting, and uncoupling the two wouldn't be too much fun. In general, this is somewhat problematic, as having lots of picling instances of 100 MB - 300 MB doesn't do our laptop hard-disks any good :(.

Is it feasible to pickle / output specific aspects of the sampler that contain the low memory information (the samples history) but don't pickle the loglikelihoodfunc? I'll try a few experiments on my laptop....

Jammy2211 commented 4 years ago

Found the following workaround, by removing the loglike before pickling:

            sampler_pickle = sampler
            sampler_pickle.loglikelihood = None

            with open(f"{self.paths.samples_path}/dynesty.pickle", "wb") as f:
                pickle.dump(sampler_pickle, f)

            sampler_pickle.loglikelihood = fitness_function

joshspeagle commented 4 years ago

This should now be easier to do with the most recent PRs merged in. Let me know if those help, or if I need to clean anything up to improve functionality.

Jammy2211 commented 3 years ago

Seems like its all sorted!

Nicholaswogan commented 2 years ago

@joshspeagle can you add a little bit of example code to the Docs, which demonstrates how to save partial progress?

segasai commented 2 years ago

I'm not entirely sure anything more than this is needed https://dynesty.readthedocs.io/en/latest/quickstart.html?highlight=generator#running-externally

If there is a snippet/example for running on HPCs with interruptions, it may be worth a section. I'm happy to see some snippet to be added. (and the way of dealing with the pool after restart may need some API improvement)

Nicholaswogan commented 2 years ago

What is the best way to save the sampler? Will these saving methods be OK with C extensions and compiled numba? Is there a way to restart from pickle saved sampler.results?

segasai commented 2 years ago

The sampler can be pickled/dilled, if the likelihood function can. If the likelihood function is not pickleable it needs to be set to None likely before pickling. Regarding the restarting from results, I don't believe it is possible. I don't know if it is worth implementing. IMO it is worth implementing a more user-friendly way of setting a pool after the restore (and maybe setting the likelihood function, if it's also nonpickleable)

Nicholaswogan commented 2 years ago

Thanks! I guess it makes the most sense to save samplers always, instead of saving results.

segasai commented 2 years ago

I think if the goal is to continue sampling, then yes, otherwise I'd say saving results is better.

Nicholaswogan commented 2 years ago

When running

for it, res in enumerate(sampler.sample(dlogz=0.5)):
    pass

I get

AttributeError: 'DynamicSampler' object has no attribute 'sample'

joshspeagle / dynesty

Unable to resume run following pickle / dill #181