leginon-org / leginon-redmine-archive

1 stars 0 forks source link

FFT in redux of large micrographs is too slow #4891

Open leginonbot opened 3 months ago

leginonbot commented 3 months ago

Author Name: Neil Voss (@vosslab) Original Redmine Issue: 4891, https://emg.nysbc.org/redmine/issues/4891 Original Date: 2017-04-07 Original Assignee: Neil Voss


It would be great if you could also take a look at the FFT of the large images as these are very slow. There are two layers of problems on FFT:

  1. redux was original written for multi-thread but FFTw implementation for python had a memory leak when multi-threaded. There might be a solution now or a better thing to use.

  2. Large image FFT is slow by default, but K2 super resolution images are even worse. Currently the way to get faster is to make wisdom plan. If it can be better, do so.

If you are going to do some changes in there, keep in mind that K2 images are not square, nor a play-it-nice dimension. We also need to crop first before transform so that the result will look more like optical diffraction with the same pixel size in both axes.


I was looking over the FT code in Appion. It seems that both fftw3 and fftpack have been tried to speed this up. I see Anchi has played with the wisdom file to optimize the calculations. There is only so much you can do to speed up an FFT calculation.

Things that could be sped up with their caveats:

A. Figure out multi-threading. May just work in CentOS7 vs. CentOS6.

B. Change the shape. It appears that we are not using a power of 2 FFT (fast FT) but rather a slower DFT (discrete FT). Possible work around:

  1. Pad the micrograph into a power of 2 box, such as from 7k x 5k into 8192 x 8192 (slower in my tests)
  2. Do the four corners of the micrograph and average them (loss of outer resolution, but would be minimal) 5k x 7k into 4096x4096 was twice as fast, but if we do 4 times we are doubling the total time.

C. We could per-calculate the FT and save it disk. The could be done using a daemon running on a GPU node. Do we save the power spectrum only or the complex form.

D. We could go full GPU with this and require that the reduxd server be run on a GPU system. [Or even more crazy with a FPGA/ASIC hardware FFT solution: http://www.dilloneng.com/2d-fft.html ; handles 16-bit pixel data, at resolution of 2K x 2K pixels, at a frame rate of 120 fps. ]

I think it needs more discussion.

leginonbot commented 3 months ago

Original Redmine Comment Author Name: Neil Voss (@vosslab) Original Date: 2017-04-08T15:53:21Z


This sounds like something to try:

https://pypi.python.org/pypi/scikit-cuda