joshspeagle / dynesty

Dynamic Nested Sampling package for computing Bayesian posteriors and evidences
https://dynesty.readthedocs.io/
MIT License
356 stars 77 forks source link

Dynamic NS with rwalk idles with no real progress leading to memory leak #352

Closed 3fon3fonov closed 2 years ago

3fon3fonov commented 2 years ago

The image below summarizes the problem.

dynesty_problem

This is wrapping dynesty within the Exo-Striker exoplanet toolbox. The run above is a multi-dimensional fit, i.e. joint fitting of transit and Doppler exoplanet data + one planet. Usually, the Exo-Striker uses dynesty with no problem, but sometimes the dynamic nested sampling is stuck badly and I cannot recover the (otherwise expensive) run. In this example, if I use dlogz=True, leads to severe memory leakage. If I defined dlogz=False and I set the "maxiter" and "maxcall," sometimes it works, but often the pools stuck and again I cannot recover the run.

This is with: Dynamic NS 100% focused on the posterior, random walk, and "multi" bounding option.

Additionaly:

Linux Ubuntu 18.04 python 3.8 pathos==0.2.6 numpy==1.20.3 corner==2.1.0 scipy==1.7.1

Any ideas?

segasai commented 2 years ago
3fon3fonov commented 2 years ago

Sorry, I wanted to list the dynesty version and I forgot. I am with dynesty==1.1

installed via pip.

Usually, Exo-Striker runs are not taking much memory. Like I wrote above, this problem occurs sporadically with dynesty, and never with "emcee", for example. Now, if I run the same setup but with different priors, say I define more conservative prior space, then the run is very likely to converge successfully. So my temporary fix is to play with the prior space, but this is not stable and is time-consuming. I don't see how this can be related to the Exo-Striker. It is still possible that this is related to the multiprocess pool manager "pathos", but since pathos is a wrapper around "multiprocess" I am a bit skeptic.

And yes, almost 12 hours after iteration 156730, there is no output. This is the major problem I am reporting here.

segasai commented 2 years ago

I see. Given what you say, my recommendation is to try the dynesty version from git master branch (that'll be next 1.2 version) and report whether you see the issue there. If you still see the run hanging, you should try to interrupt the run and report the traceback.

3fon3fonov commented 2 years ago

Ok with the GitHub version it seems to work. However, I was unable to see the results because of the following crash:

Traceback (most recent call last):
  File "/home/trifonov/git/exostriker-ready/exostriker/lib/worker.py", line 64, in run
    result = self.fn()
  File "/home/trifonov/git/exostriker-ready/exostriker/gui.py", line 7772, in run_nest
    fit = rv.run_nestsamp(fit)
  File "/home/trifonov/git/exostriker-ready/exostriker/lib/RV_mod/__init__.py", line 2132, in run_nestsamp
    add_ns_samples(obj,sampler)
  File "/home/trifonov/git/exostriker-ready/exostriker/lib/RV_mod/functions.py", line 699, in add_ns_samples
    obj.ns_sampler.lbf     = {k: np.array([obj.e_for_mcmc[k], True]) for k in range(len(obj.e_for_mcmc))}
  File "/home/trifonov/.local/lib/python3.8/site-packages/dynesty/results.py", line 323, in __setattr__
    raise RuntimeError("Cannot set attributes directly")
RuntimeError: Cannot set attributes directly
<class 'TypeError'> object of type 'Results' has no len() <traceback object at 0x7f296c2f1d40>
===== 2021.11.28 23:11:04 =====
Traceback (most recent call last):
  File "/home/trifonov/git/exostriker-ready/exostriker/gui.py", line 7653, in worker_nest_complete
    self.check_cornerplot_samples()
  File "/home/trifonov/git/exostriker-ready/exostriker/gui.py", line 8143, in check_cornerplot_samples
    if len(fit.ns_sampler)!=0:
TypeError: object of type 'Results' has no len()

This happens only with the GutHub version. I have a small function which copy the "sampler.results" in the Exo-Striker fit-object. I.e. obj.ns_sampler= dill.copy(sampler.results)

Before was an array-like structure, so what changed? I guess I can debug and modify the ES, but I hesitate before version 1.2 is pip ready, An idea how to handle this obstacle will be appreciated.

p.s. Initially I thought everything worked fine, since dynesty converged, but I noticed I forgot to include the RVs... I am running another test and will report when is done.

segasai commented 2 years ago

Great! Yes, there were indeed quite a lot of improvements done https://github.com/joshspeagle/dynesty/issues/254#issuecomment-912673709 over last half a year.

It was certainly not the intention to break the API, but I am not sure that the way you used Results was intended.

3fon3fonov commented 2 years ago

I can confirm that what was hanging before with ver.1.1 now converged with the GitHub version! Of course, I will observe the dynesty behavior more closely in future.

I have an entry in the ES fit. object which stores the samplers results, i.e., from emcee and dynesty. Initially these are an empty list

fit.ns_sampler= [] fit.mcmc_sampler= []

and when e.g., dynesty is done I make a dill.copy():

fit.ns_sampler= dill.copy(sampler.results)

Then I reuse obj.ns_sampler for my needs. Everything I need for plotting the posteriors and extract statistics is in fit.ns_sampler which is the dynesty results + some extra structures, which I added. For example, If I want to change the label of some parameter for the cornerplot I modify this object. However, now I will be getting such errors:

Traceback (most recent call last):
  File "/home/trifonov/git/exostriker-ready/exostriker/lib/worker.py", line 64, in run
    result = self.fn()
  File "/home/trifonov/git/exostriker-ready/exostriker/gui.py", line 7772, in run_nest
    fit = rv.run_nestsamp(fit)
  File "/home/trifonov/git/exostriker-ready/exostriker/lib/RV_mod/__init__.py", line 2132, in run_nestsamp
    add_ns_samples(obj,sampler)
  File "/home/trifonov/git/exostriker-ready/exostriker/lib/RV_mod/functions.py", line 699, in add_ns_samples
    obj.ns_sampler.lbf     = {k: np.array([obj.e_for_mcmc[k], True]) for k in range(len(obj.e_for_mcmc))}
  File "/home/trifonov/.local/lib/python3.8/site-packages/dynesty/results.py", line 323, in __setattr__
    raise RuntimeError("Cannot set attributes directly")
RuntimeError: Cannot set attributes directly

So what do you suggest doing in this case? It looks like the change of the API will require quite some changes in the Exo-Striker... Can I modify the results object by force? Maybe this will be done after you release dynesty ver.1.2, but I must assure there is some backward compatibility with old ES sessions.

3fon3fonov commented 2 years ago

Actually, this should be easy....

All I need to do is to: obj.ns_sampler.lbf --> obj.ns_sampler._lbf

I also noticed a function in the dynesty results class named .asdict(). However, this does not return the old dictionary type. It would be nice to have this option for backward compatibility.

3fon3fonov commented 2 years ago

I will close this because I don't see the weird behavior using the GitHub Version of the sampler. I will reopen if it re-appears and will try to provide a traceback.

So please push version 1.2 as soon as possle!

3fon3fonov commented 2 years ago

First, I would like to emphasize that the Dynamic Nested sampler in the master GitHub works much better than the pip 1.1 version! Most of my runs converge much faster to the desired result. Yet, as I promised, I will reopen this “issue” if I encounter a problem. See the image below:

Screenshot at 2021-12-07 13-09-18

After 9763 iterations just stop working and there is one open pool that idles for over 200 minutes. I have seen this problem before, so I am not sure if it is related to the original post here, but is something we should discuss. BTW, exacly the same run but with Livepoints = Npar(13) x 100 = 1300 executed with no problem. I decided to increase a bit to Livepoints = Npar x 150 = 1950 and the problem appered.

segasai commented 2 years ago

Could you please report the traceback from that hanging run (if you interrupt it with Ctrl-C) ? (it'll be hard to fix the issue without more info).

3fon3fonov commented 2 years ago

It is kind of long, but here you go:


..
...
Process ForkPoolWorker-101:
Process ForkPoolWorker-87:
Process ForkPoolWorker-91:
Process ForkPoolWorker-102:
Process ForkPoolWorker-92:
Process ForkPoolWorker-103:
Process ForkPoolWorker-98:
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 315, in _bootstrap
    self.run()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 315, in _bootstrap
    self.run()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 315, in _bootstrap
    self.run()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/pool.py", line 114, in worker
    task = get()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/pool.py", line 114, in worker
    task = get()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/pool.py", line 114, in worker
    task = get()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/queues.py", line 358, in get
    with self._rlock:
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/queues.py", line 358, in get
    with self._rlock:
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/synchronize.py", line 101, in __enter__
    return self._semlock.__enter__()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/queues.py", line 358, in get
    with self._rlock:
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/synchronize.py", line 101, in __enter__
    return self._semlock.__enter__()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/synchronize.py", line 101, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
KeyboardInterrupt
KeyboardInterrupt
Process ForkPoolWorker-104:
Traceback (most recent call last):
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 315, in _bootstrap
    self.run()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/pool.py", line 114, in worker
    task = get()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/queues.py", line 358, in get
    with self._rlock:
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/synchronize.py", line 101, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
Traceback (most recent call last):
Traceback (most recent call last):
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 315, in _bootstrap
    self.run()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 315, in _bootstrap
    self.run()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/pool.py", line 114, in worker
    task = get()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/pool.py", line 114, in worker
    task = get()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/queues.py", line 358, in get
    with self._rlock:
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/queues.py", line 358, in get
    with self._rlock:
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/synchronize.py", line 101, in __enter__
    return self._semlock.__enter__()
Traceback (most recent call last):
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/synchronize.py", line 101, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
KeyboardInterrupt
Traceback (most recent call last):
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 315, in _bootstrap
    self.run()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/pool.py", line 114, in worker
    task = get()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 315, in _bootstrap
    self.run()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/queues.py", line 358, in get
    with self._rlock:
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/synchronize.py", line 101, in __enter__
    return self._semlock.__enter__()
Traceback (most recent call last):
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
KeyboardInterrupt
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/pool.py", line 114, in worker
    task = get()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/queues.py", line 358, in get
    with self._rlock:
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/synchronize.py", line 101, in __enter__
    return self._semlock.__enter__()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 315, in _bootstrap
    self.run()
KeyboardInterrupt
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/pool.py", line 114, in worker
    task = get()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/queues.py", line 358, in get
    with self._rlock:
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/synchronize.py", line 101, in __enter__
    return self._semlock.__enter__()
Traceback (most recent call last):
Traceback (most recent call last):
KeyboardInterrupt
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 315, in _bootstrap
    self.run()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 315, in _bootstrap
    self.run()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/pool.py", line 114, in worker
    task = get()
Traceback (most recent call last):
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/pool.py", line 114, in worker
    task = get()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/queues.py", line 358, in get
    with self._rlock:
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/queues.py", line 358, in get
    with self._rlock:
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/synchronize.py", line 101, in __enter__
    return self._semlock.__enter__()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/synchronize.py", line 101, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
KeyboardInterrupt
Traceback (most recent call last):
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 315, in _bootstrap
    self.run()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/pool.py", line 114, in worker
    task = get()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/queues.py", line 358, in get
    with self._rlock:
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/synchronize.py", line 101, in __enter__
    return self._semlock.__enter__()
Traceback (most recent call last):
KeyboardInterrupt
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 315, in _bootstrap
    self.run()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 315, in _bootstrap
    self.run()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
Traceback (most recent call last):
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/pool.py", line 114, in worker
    task = get()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/pool.py", line 114, in worker
    task = get()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/queues.py", line 358, in get
    with self._rlock:
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/queues.py", line 358, in get
    with self._rlock:
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/synchronize.py", line 101, in __enter__
    return self._semlock.__enter__()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/synchronize.py", line 101, in __enter__
    return self._semlock.__enter__()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 315, in _bootstrap
    self.run()
KeyboardInterrupt
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 315, in _bootstrap
    self.run()
KeyboardInterrupt
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/pool.py", line 114, in worker
    task = get()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/queues.py", line 358, in get
    with self._rlock:
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/pool.py", line 114, in worker
    task = get()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 315, in _bootstrap
    self.run()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/queues.py", line 358, in get
    with self._rlock:
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/synchronize.py", line 101, in __enter__
    return self._semlock.__enter__()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/synchronize.py", line 101, in __enter__
    return self._semlock.__enter__()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
Traceback (most recent call last):
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/pool.py", line 114, in worker
    task = get()
KeyboardInterrupt
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/queues.py", line 358, in get
    with self._rlock:
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/synchronize.py", line 101, in __enter__
    return self._semlock.__enter__()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 315, in _bootstrap
    self.run()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 315, in _bootstrap
    self.run()
Traceback (most recent call last):
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 315, in _bootstrap
    self.run()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/pool.py", line 114, in worker
    task = get()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 315, in _bootstrap
    self.run()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/pool.py", line 114, in worker
    task = get()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/queues.py", line 358, in get
    with self._rlock:
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/queues.py", line 358, in get
    with self._rlock:
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/pool.py", line 114, in worker
    task = get()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/synchronize.py", line 101, in __enter__
    return self._semlock.__enter__()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/pool.py", line 114, in worker
    task = get()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/synchronize.py", line 101, in __enter__
    return self._semlock.__enter__()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/queues.py", line 358, in get
    with self._rlock:
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 315, in _bootstrap
    self.run()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 315, in _bootstrap
    self.run()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/synchronize.py", line 101, in __enter__
    return self._semlock.__enter__()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
KeyboardInterrupt
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/pool.py", line 114, in worker
    task = get()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/pool.py", line 114, in worker
    task = get()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/queues.py", line 358, in get
    with self._rlock:
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/queues.py", line 358, in get
    with self._rlock:
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/synchronize.py", line 101, in __enter__
    return self._semlock.__enter__()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/synchronize.py", line 101, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
KeyboardInterrupt
KeyboardInterrupt
Traceback (most recent call last):
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 315, in _bootstrap
    self.run()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/pool.py", line 114, in worker
    task = get()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/queues.py", line 358, in get
    with self._rlock:
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/synchronize.py", line 101, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
KeyboardInterrupt
KeyboardInterrupt
Traceback (most recent call last):
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 315, in _bootstrap
    self.run()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/pool.py", line 114, in worker
    task = get()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/queues.py", line 358, in get
    with self._rlock:
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/synchronize.py", line 101, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
Traceback (most recent call last):
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 315, in _bootstrap
    self.run()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/pool.py", line 114, in worker
    task = get()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/queues.py", line 358, in get
    with self._rlock:
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/synchronize.py", line 101, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
Traceback (most recent call last):
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 315, in _bootstrap
    self.run()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/pool.py", line 114, in worker
    task = get()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/queues.py", line 358, in get
    with self._rlock:
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/synchronize.py", line 101, in __enter__
    return self._semlock.__enter__()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/queues.py", line 358, in get
    with self._rlock:
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/synchronize.py", line 101, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
KeyboardInterrupt
Traceback (most recent call last):
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 315, in _bootstrap
    self.run()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/pool.py", line 114, in worker
    task = get()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/queues.py", line 358, in get
    with self._rlock:
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/synchronize.py", line 101, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
KeyboardInterrupt
Traceback (most recent call last):
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 315, in _bootstrap
    self.run()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/pool.py", line 114, in worker
    task = get()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/queues.py", line 358, in get
    with self._rlock:
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/synchronize.py", line 101, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
Traceback (most recent call last):
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 315, in _bootstrap
    self.run()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/pool.py", line 114, in worker
    task = get()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/queues.py", line 358, in get
    with self._rlock:
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/synchronize.py", line 101, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
Traceback (most recent call last):
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 315, in _bootstrap
    self.run()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/pool.py", line 114, in worker
    task = get()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/queues.py", line 359, in get
    res = self._reader.recv_bytes()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/connection.py", line 219, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/connection.py", line 417, in _recv_bytes
    buf = self._recv(4)
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/connection.py", line 382, in _recv
    chunk = read(handle, remaining)
KeyboardInterrupt
Traceback (most recent call last):
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 315, in _bootstrap
    self.run()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/pool.py", line 114, in worker
    task = get()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/queues.py", line 358, in get
    with self._rlock:
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/synchronize.py", line 101, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
Traceback (most recent call last):
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 315, in _bootstrap
    self.run()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/pool.py", line 114, in worker
    task = get()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/queues.py", line 358, in get
    with self._rlock:
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/synchronize.py", line 101, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
Traceback (most recent call last):
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 315, in _bootstrap
    self.run()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/pool.py", line 114, in worker
    task = get()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/queues.py", line 358, in get
    with self._rlock:
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/synchronize.py", line 101, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
Traceback (most recent call last):
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 315, in _bootstrap
    self.run()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/pool.py", line 114, in worker
    task = get()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/queues.py", line 358, in get
    with self._rlock:
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/synchronize.py", line 101, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
Traceback (most recent call last):
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 315, in _bootstrap
    self.run()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/pool.py", line 114, in worker
    task = get()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/queues.py", line 358, in get
    with self._rlock:
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/synchronize.py", line 101, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
Process ForkPoolWorker-120:
Traceback (most recent call last):
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 315, in _bootstrap
    self.run()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/pool.py", line 114, in worker
    task = get()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/queues.py", line 358, in get
    with self._rlock:
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/synchronize.py", line 101, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
Traceback (most recent call last):
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 315, in _bootstrap
    self.run()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/pool.py", line 114, in worker
    task = get()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/queues.py", line 358, in get
    with self._rlock:
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/synchronize.py", line 101, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
Traceback (most recent call last):
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 315, in _bootstrap
    self.run()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/pool.py", line 114, in worker
    task = get()
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/queues.py", line 358, in get
    with self._rlock:
  File "/home/trifonov/.local/lib/python3.8/site-packages/multiprocess/synchronize.py", line 101, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
segasai commented 2 years ago

Thanks, but to be honest I see no dynesty in the trace, so I am not convinced that dynesty is at fautl here (it could still be, but the traceback does not support that)

3fon3fonov commented 2 years ago

I am puzzled. A slight change in the prior space and the very same run is executed till the end. This problem is either random, or perhaps in some complex conditions dynesty adopts smaller and smaller parameter steps somewhere in the code? If this is not dynesty then is it possible to be scipy.stats.norm.ppf(), which is used for the normal prior?

I would like to get some feedback from some other dynesty users if they ever experienced such a problem. It is of course possible the problem to be on my side, in particular my OS/Python. Else, I don't really see what wrong I could be doing.

segasai commented 2 years ago

It is hard to tell where the problem could be, but the traceback you showed indicate that the code hangs in some multiprocessing machinery. IMO I'd try to run your fit using a single process (without the pool) (sure it'll take longer), interrupt it and provide the traceback (if the problem happens at all).

segasai commented 2 years ago

Also, I don't quite know how you run the sampler, but it'd be good to run in in fully deterministic fashion -- i.e. by specifying the random_state, this way if it is dynesty's fault, the problem should happen always with the same seed parameter.

3fon3fonov commented 2 years ago

Thanks for the input. I assume the past problem is not related to dynesty. I will experiment and in case I find something that could be related to dynesty I will open a separate issue.