mperrin / poppy

Physical Optics Propagation in Python
BSD 3-Clause "New" or "Revised" License
173 stars 40 forks source link

calcPSF halts in certain specific circumstances. #238

Closed ariedel closed 6 years ago

ariedel commented 6 years ago

This is another odd one, perhaps particular to my setup. It seems that if you run calcPSF twice in the same method, it works. If you try to run it in two different methods, the second one just... freezes.

The reason I want to do this is: Pandeia needs its PSFs to have the pupil throughput in the FITS files. The way we do this is to run a special additional calculation just to get the throughput for the instrument+mask combination before we start generating PSFs.

Previously, it was part of the main psf generation method, back when we were doing a loop within that function. Now, I want to separate it out into its own method... but it doesn't work if I do.

If I run the code in THIS gist: https://gist.github.com/ariedel/b478034bf2f96ce3bc8b2f21e6932c9d python make_psf_test.py WebbPSF starts spitting out PSFs, as you'd expect. (this exhibits a memory leak on my machine, though). In this version, the first time psfgen is run, pupil throughput is calculated and then the rest of the PSFs run.

If I run THIS code, https://gist.github.com/ariedel/e228bc5bb43ad2e0442f72a31d3b022c, it halts. It doesn't crash, it doesn't consume resources and jam up the machine, but CPU activity goes to zero and it just sits there waiting. python make_psf_test2.py The sole difference is that pupil_throughput is now being calculated in a different method before psfgen is run.

I spent a few hours tracing it through POPPY code until I think I found that it's FFTW that is halting (I don't remember exactly where that code was, unfortunately).

mperrin commented 6 years ago

Are you using multiprocessing? And what version of Python? We've seen issues where parallel FFTW calculations don't play well with multiple instances invoked in parallel at a higher level. There are also issues in general with multi processing on macOS on python 2.7, but the code should be raising an error if you even try that.

First thing to try is just disabling FFTW in the poppy configuration settings, so you just use the regular python FFT. Let me know if that makes the problem go away, and if so it's diagnostic that it is definitely FFTW.

ariedel commented 6 years ago

I am indeed using multiprocessing on Python 2.7.13, on Mac OS X 10.12 Sierra.

ariedel commented 6 years ago

Disabling FFTW fixed the halting problem.

mperrin commented 6 years ago

See https://github.com/mperrin/poppy/issues/23 and https://github.com/numpy/numpy/issues/5752

The short version: The way that multiprocessing is implemented in Python 2.7 has fundamental incompatibilities with Apple's Accelerate framework, so if you're using anything compiled against that library you may (will) see random hangs like this. The best solution we found is to move to Python ≥ 3.4, which provides an different method for forking off processes that works reliably.

We've also seen issues with using FFTW inside multiprocessing. I don't know if that's because FFTW is invoking multiprocessing, or if it's somehow just not thread-safe itself. That said, if you're using FFTW, you can just ask FFTW itself to parallelize calculations across all your CPU cores, in which case there is no performance gain from wrapping that in another layer of parallelization.

ariedel commented 6 years ago

Ok, that makes sense. I'm ready to consider this ticket resolved.