joshspeagle / dynesty

Dynamic Nested Sampling package for computing Bayesian posteriors and evidences
https://dynesty.readthedocs.io/
MIT License
347 stars 76 forks source link

dynesty>0.9.2 crashes the Exo-Striker GUI if stderr is piped (Py3 only) #183

Closed 3fon3fonov closed 4 years ago

3fon3fonov commented 4 years ago

I am really puzzled by the following problem:

If one uses The Exo-Striker with dynesty==0.9.2, then everything works fine! As soon as you upgrade to dynesty==0.9.3 and above, the GUI crashes without any obvious error (to clarify, it cannot start at all..., and only happens if used with Python3). The Exo-Striker pipes the stderr to the GUI, so if bypassed with the "-debug" flag the tool is back on the track. Thus, I think it must be some missed exception during the stderr piping.

The only thing I was able to catch from the GUI was this:

<class 'KeyError'> 'rstate' <traceback object at 0x7fef607309c8>
===== 2020.05.06 10:05:06 =====
Segmentation fault

More details you can find here: https://github.com/3fon3fonov/exostriker/issues/59

I have the feeling that the "bug" was introduced in one of these two commits: https://github.com/joshspeagle/dynesty/commit/70f69b678b762e9589f5f68ec3caeb57163b925c https://github.com/joshspeagle/dynesty/commit/ed9f3dc301d74350553c8d43f6dc45cee671014d

These were done just before version 0.9.3 (from which the Py3/ES setup no longer works) and in these commits, you are improving the terminal output, which could be related to the sys.stderr problem.

Do you have any idea what might have changed between 0.9.2 and 0.9.3 (and now 1.0.1), that could trigger such behavior? Since this is Py3 related problem, perhaps there is somewhere a Unicode string incompatibility between Py2/Py3? Can you inspect these commits and let me know what do you think?

All the best, Trifon

joshspeagle commented 4 years ago

Hi Trifon,

Hmmm...this is definitely an interesting failure case (the fact that it works in some cases is even stranger). I have no idea what could have triggered it. The only new addition I see if the use of shutil via shutil.get_terminal_size(fallback=(80, 25)). The call to rstate puzzles me as well, since that's not called anywhere when printing out the results. Very strange...

3fon3fonov commented 4 years ago

Well, it could be in my garden, but I really tried to eliminate everything that could be on the ES side. I spend almost a day debugging and nailed down to dynesty. This bug was confirmed by three independent users on MAC OS and Linux. downgrading dynesty to 0.9.2 fixes the Exo-Striker...

Something has changed when upgrading to dynesty==0.9.3 +. Did you request dynesty dependency updates with respect to 0.9.2?

This is definitely a Py3 issue, but we need to get to the root of the problem to bypass it.

joshspeagle commented 4 years ago

I don't think so. If ensuring that shutil is installed fixes this, then that's the issue. Otherwise, I'll have to dig into this to see what could be going on.

Also, this isn't using the tqdm optional progress bar but just the stderr piped output, right?

3fon3fonov commented 4 years ago

Hi Josh,

I will try to experiment with shutil and will let you know. I must say that I commented the line you are referring in the results.py and defined the "terminal size" manually, but there was no change. I am using shutil in a couple of places in the ES and I never had problems before. However, this function get_terminal_size sounds suspicious to me.... Perhaps shutil has different behavior depending on its version, or version of Python?

joshspeagle commented 4 years ago

Sounds good. Let me know how that goes and if you have any more information/logs on what might be happening. If this doesn't fix the issue, I'll dig into it as well as see what turns up.

3fon3fonov commented 4 years ago

Dear Josh,

I must say I am still puzzled, but I least I found when exactly the bug was introduced. It is in this commit:

https://github.com/joshspeagle/dynesty/commit/50483817ae2de6bf24d2e711a71e900a7fa5a778

in particular file dynesty/bounding.py

line from sklearn.covariance import oas as oas_cov

if you remove that import the GUI on Py3 falls apart...

I copy the line in Version 1.0.2 and guess what? It works !!!

This is bizarre.... Any ideas?

joshspeagle commented 4 years ago

wut

I'm pretty confident I removed all of the dependencies on sklearn well before 1.0.2, so the fact that this somehow works is beyond bizarre... I have absolutely no idea what is going on here at all (@_@*)

3fon3fonov commented 4 years ago

Actually all I needed to do is to as a general fix is:

import sklearn

in my main GUI .py file and that's it....

This is mind-blowing. I suspect there is a thrown exception somewhere that affects only Py3.

Let me investigate further... I am using also wotan for transit light curve detrending in the Exo-Striker, and wotan has an optional dependence on sklearn.... This could explain why my WM Mint OS did not crash at dynesy 1.0.2. I don't think I installed the full wotan there.... It is quite possible that dynesty==0.9.2 with its sklearn import was saving me from a crash, that I would otherwise catch.

3fon3fonov commented 4 years ago

Josh,

I think we can safely put this under the carpet... It was quite an endeavor, but I find a workaround. It was indeed on the wotan side, where I check if sklearn is imported. So Dynesty==0.9.2 was indeed giving a false feeling that this import works. I don't understand the problem, it should work the way it was coded, but... I don't want to dig anymore... This problem will never occur to anybody other than me. Case closed, thanks for your time.!