Optimize convolve_sensitive_float_matrix!

andreasnoack commented 7 years ago

The orange bars right below the tall (and widest) towers in the profile image seem to be this line and this line which are loading floating point values into a buffer. The two towers are the dft calculations so loading the buffers take way longer than computing the dft and the loading seems to consume about 40-45pct of the total runtime.

@jrevels and I looked a bit into this and our guess right now is that the loading is slow because of the two pointer loads and jumpy memory access pattern. This seems to be the main bottleneck right now and it can probably be optimized in various ways.

jeff-regier commented 7 years ago

@rgiordan -- There's a comment at the hot spot that suggests you've thought about avoiding the copy. Do you have ideas for how to do that? That loop and the one after it accounts for half the total runtime. The fft takes very little time in comparison.

        for h in h_range, w in w_range
          # TOOD: avoid this copy?
          fft_matrix[h, w] = sf_matrix[h, w].h[ind1, ind2]
        end

rgiordan commented 7 years ago

One somewhat invasive idea would be to make each element of a SensitiveFloat's Hessian matrix itself be a matrix of complex numbers (rather than a real, as it is now). The populate_fsm_vec functions would set the real part of the appropriate elements of the Hessian, and then you could do run safe_fft! directly on each element of the Hessian. Needless to say, as long as you're doing that, you may as well do it for the value and derivatives, too.

Another crazier idea (that would require more Julia-foo than I have available off the top of my head) would be to write an AbstractArray that mimics the behavior of a matrix of complex numbers with custom get and set methods to allow it to interact directly with the SensitiveFloat Hessian values. A side effect would be trashing the original SensitiveFloat, but that would be fine, I think. That seems kind of crazy and maybe not possible, but we're brainstorming I guess.

jeff-regier / Celeste.jl

Optimize convolve_sensitive_float_matrix! #486