Implement matching in frequency space to reduce the dictionary size

hakonanes commented 3 years ago

It would be very nice to provide pattern matching in frequency space to reduce the dictionary size, similar to parts of the Refined Template Matching (RTM) approach available in AstroEBSD.

I don't think our current refinement with SciPy can handle such large angular deviations, or the bounds would have to be too large to be feasible, so we would need to implement the iterative refinement described in the RTM paper (https://doi.org/10.1016/j.ultramic.2019.112845).

Things that are needed for all this (will add more as the work progresses):

[ ] Convolve experimental to simulated pattern in frequency space (or direct space for small patterns) to get the x/y shifts with scipy.signal.convolve or look at the code in ~skimage.feature.register_translation~ skimage.registration.phase_cross_correlation. We should time various available approaches for the fastest convolution.
[ ] (Log) polar transform to eventually account for the z rotation of the best matching template, after x/y shifts have been accounted for. @din14970 has done some very nice work in his fork of pyxem, ~although that is the linear polar transform. We will need to modify this accordingly.~ Perhaps the linear polar transform works fine, we'll see!
[x] More robust and tested orientation sampling approach than the one available in orix.sampling.get_sample_fundamental(), as I have found this sampling to be insufficient close to the identity orientation for cubic symmetries. (Note to self to raise issue in orix!) I have a quick and dirty implementation of EMsoft's EMsampleRFZ (Rodrigues Fundamental Zone) program for cubic symmetry, I might set up a PR with this to orix shortly as well.
[ ] Consider whether the "in-plane rotation" to correct for rotations about screen/PC center needs to be done for the full +/- 180 degrees, or if a narrower range is sufficient.

pc494 commented 3 years ago

(given you did tag it "help wanted") - I would suggest using scipy over skimage for this application. That section of skimage has been a little fiddly in the past and scipy is (in general) really solid.

hakonanes commented 3 years ago

(Exactly these tips I had in mind, thanks!) Yes, I remember the API or something similar of register_translation() being broken in a patch release (v0.16.2 or something). Since the convolution will be done > 100 000 times in general, I think being more presumptuous about array shapes and data types and using SciPy directly is the way to go.

din14970 commented 3 years ago

Had a quick look at this paper. I can imagine for EBSD the entire problem becomes again much more difficult and painful than say NBED. A few thoughts:

think immediately about the option of a GPU implementation, you will get massive speedups
last time I tested, scipy's convolve was unbearably slow, because I think it uses real space kernel convolution. This is why skimage directly calculates the convolution through fft's. Possibly there's also a clean and fast scipy implementation, e.g. here using method='fft'.
I honestly don't understand the benefit of using a log-polar transform over a linear polar transform for matching the in-plane rotation to be honest. Log polar will blow up the importance of whatever is at the origin as this region will be sampled more. But the position of features further away is more sensitive to the rotation and should "weigh" more than whatever is at the center, no?
A big difference between NBED and EBSD templates is that the former can be represented by a sparse list of points, whereas the later is necessarily a dense image. So to match the in-plane angle from (log) polar images, you will probably be best served with the FFT cross-correlation method as they describe in the paper. What I do in my implementation is "slide" the template across the pattern, since this involves only few multiplications at each "angle"; transforming this to a dense image, taking the FFT and multiplying all pixels turned out to be less efficient.
Sampling of orientation space will also be different, you will need much larger libraries.

hakonanes commented 3 years ago

think immediately about the option of a GPU implementation, you will get massive speedups

Yeah, I've heard that your implementation of template matching using CuPy is very fast! I would if I either had a Cuda enabled graphics card available (to use CuPy), or had the time to learn PyOpenCL...

last time I tested, scipy's convolve was unbearably slow, because I think it uses real space kernel convolution. This is why skimage directly calculates the convolution through fft's. Possibly there's also a clean and fast scipy implementation, e.g. here using method='fft'.

Thanks for the heads-up. convolve calls fftconvolve if it finds that it thinks that will be faster, as per the notes in the docstring. correlate with method="fft" calls as well fftconvolve. So in some cases they should run equally fast. We should of course time various approaches to find the fastest one!

I honestly don't understand the benefit of using a log-polar transform over a linear polar transform for matching the in-plane rotation to be honest. Log polar will blow up the importance of whatever is at the origin as this region will be sampled more. But the position of features further away is more sensitive to the rotation and should "weigh" more than whatever is at the center, no?

You might be right here. In many EBSD patterns, the projection/pattern center coordinates (PCx, PCy), i.e. the detector pixel closest to the beam/sample source point, is close to the detector center. The EBSD pattern on the detector is a gnomonic projection of a part of the Kikuchi sphere, so the pattern/projection will be more distorted away from the PC. Perhaps this is an argument for weighting the less distorted pixels more than the distorted ones? I don't know the motivation for the authors using a log polar transform. It might just be that that one was easily available (the transform they use is implemented by someone else than them). This is a good point, we should try both and see what provides the best result.

A big difference between NBED and EBSD templates is that the former can be represented by a sparse list of points, whereas the later is necessarily a dense image. So to match the in-plane angle from (log) polar images, you will probably be best served with the FFT cross-correlation method as they describe in the paper. What I do in my implementation is "slide" the template across the pattern, since this involves only few multiplications at each "angle"; transforming this to a dense image, taking the FFT and multiplying all pixels turned out to be less efficient.

Thanks for this insight. We don't want to mask the pattern in any way, I think, so a full FFT convolution for EBSD patterns will be the best option, as you say.

Sampling of orientation space will also be different, you will need much larger libraries.

The EBSD community is painstakingly aware of this fact: for dictionary indexing, meaning real space matching of patterns and dictionary patterns (templates), you typically need an average misorientation of not more than ~1.5-2 degrees between dictionary patterns to find the best match. For the highest symmetry point group, m-3m, this means your dictionary needs 333 227 patterns using the excellent cubochoric orientation sampling available in EMsoft (EMSampleRFZ program). The fact that Foden et al. only need a dictionary with an average misorientation of about 7 degrees (due to matching in frequency space) is the main argument for using their approach, in my opinion!

din14970 commented 3 years ago

You made me curious to benchmark this once again (I'd done this in a sloppy way in between things, but now I made a somewhat clean notebook to illustrate it): https://nbviewer.jupyter.org/github/din14970/rotation-registration-benchmark/blob/e29061c10d2d4e8366f9e99d986dca79e986765d/FFT_vs_shift.ipynb. The FFT approach that is most general and to my knowledge most efficient:

from scipy import fft

def fft_cc(image, template):
    fft_im = fft.rfftn(image, axes=[0])
    fft_temp = fft.rfftn(template_image, axes=[0])
    image_product = fft_im * fft_temp.conj()
    cross_correlation = fft.irfftn(image_product, axes=[0]).sum(axis=1)
    return cross_correlation

Here image and template are images both in (log) polar coordinates. As far as I understand FFT's, complexity should be O(w*h*log(h)) with w and h the width and height of the polar images.

If one is dealing with a very sparse template that can be represented by a list of points, one can just loop over the non-zero pixels and shift them over the entire image. I did this in a vectorized and non-vectorized loopy way using numba. The complexity should be O(h*s) with h the height and s the number of non-zero pixels. In the worst case, if all pixels are non-zero, this approach will give a complexity of O(h^2w), which is significantly worse than the FFT approach.

However, for NBED patterns, there's like 100 spots per pattern, making this still the superior approach as you can see in the notebook.

The paper also says this in-plane rotation finding is only used for refinement, so you only need to check maybe +/- 10 degrees or so? It would be interesting to see whether the real space or Fourier space approach is faster in this case; I would wager the real space approach.

Yeah, I've heard that your implementation of template matching using CuPy is very fast! I would if I either had a Cuda enabled graphics card available (to use CuPy), or had the time to learn PyOpenCL...

Just try it out on Google Colab? This gives you access to a CUDA enabled GPU and cupy is installed by default. Only installing a proper environment including hyperspy and dev versions of stuff is a bit more of a hassle without access to a shell but I've done it.

Edit:

I've now also added GPU implementations of all methods. Note that data transfers from and to the GPU, nor compilation of the GPU kernels are included in the timing.

hakonanes commented 3 years ago

You made me curious to benchmark this once again (I'd done this in a sloppy way in between things, but now I made a somewhat clean notebook to illustrate it): https://nbviewer.jupyter.org/github/din14970/rotation-registration-benchmark/blob/e29061c10d2d4e8366f9e99d986dca79e986765d/FFT_vs_shift.ipynb. The FFT approach that is most general and to my knowledge most efficient:

The notebook is very nice, and is a good guide to base this implementation on, thanks!

The paper also says this in-plane rotation finding is only used for refinement, so you only need to check maybe +/- 10 degrees or so? It would be interesting to see whether the real space or Fourier space approach is faster in this case; I would wager the real space approach.

I'm not entirely sure if the required rotation is always within +/- 10 degrees, we would have to check this for all crystal systems before assuming so. If that is the case, we should definitely restrict the necessary rotation to increase speed!

Just try it out on Google Colab? This gives you access to a CUDA enabled GPU and cupy is installed by default. Only installing a proper environment including hyperspy and dev versions of stuff is a bit more of a hassle without access to a shell but I've done it.

I'm sure that works great for trying out computations with GPU, but I don't want to develop code that way, as I run all tests locally all the time while adding new functionality.

I've now also added GPU implementations of all methods. Note that data transfers from and to the GPU, nor compilation of the GPU kernels are included in the timing.

I saw that: very impressive speed-up! Many of the computations done in kikuchipy would benefit from the GPU. I just need easy access to one myself, or someone with easy access must start to contribute.

din14970 commented 3 years ago

someone with easy access must start to contribute

Hint hint ;) ? If you have something specific you could ping me and I could give it shot. If code is vectorized and numpy based, cupy most likely will do the trick for free. If it's "loopy" things become more complicated.

hakonanes commented 3 years ago

Hint hint ;) ? If you have something specific you could ping me and I could give it shot. If code is vectorized and numpy based, cupy most likely will do the trick for free. If it's "loopy" things become more complicated.

Totally! Thanks, I might ask you if I or someone finds time to work on this.

hakonanes commented 3 years ago

The approach taken in scikit-image's ~register_translation()~ phase_cross_correlation from Guizar-Sicairos et al. (2008) is the one used in the RTM approach, and I believe the one used in the HREBSD approach of Wilkinson et al. as well.

~register_translation() computes the error and phase difference, which I don't think we need here;~ We can skip error and phase difference with phase_cross_correlation(); we just need the horizontal and vertical shift and the maximum correlation score, and the latter ~register_translation()~ phase_cross_correlation() actually does not return...~ Because of this and the fact that we can assume more about the input parameters than that function can, I think we should adopt that function to our needs.

hakonanes commented 1 year ago

I'm closing this in an effort to focus work going forward.

I believe it is more important to improve the existing tools and interoperability with other packages (like AstroEBSD) to enable better workflows, rather than adding alternative implementations to something the package already can do (Hough indexing and pattern matching).

Please ask for this issue to be re-opened if someone wants to work on this.

pyxem / kikuchipy

Implement matching in frequency space to reduce the dictionary size #414