Open uellue opened 1 year ago
The previous tests were run with parameters that turned out unusual, i.e. way too many reflections. Here's a more typical result, twice as fast for the whole simulation:
Thank you for this improvement, @uellue! The changes look good and the tests pass, so I'm happy to merge this PR.
Is it important for you to have this released soon? We don't have a release planned in the near future, as far as I'm aware, as the only current unreleased changes are maintenance updates. I plan to add some functionality to downstream kikuchipy in the next months, though. If I find anything to add or update to diffsims, I'll probably make a diffsims v0.6.0 release in connection with that.
A thought regarding speed of computations. While it's convenient to use class methods from diffpy.structure, they are not optimized for larger arrays, as you show here. If we replace more of these calls with our own computations, we could consider dropping diffpy.structure as a dependency.
Releasing can wait until it is convenient, no worries!
About replacing diffpy, their implementations didn't look so bad. This method was just a particular case because it still is THE hotspot of the entire simulation, intermediate values can be used for the return values instead of being recalculated, and the structure of the problem was perfect for broadcasting. The implementation here is pretty much equivalent to inlining the diffpy implementation. I also tried a Numba, but that didn't help. I guess that appending to a list of selected reflections depending on the norm is hard to vectorize for the compiler, so creating and then filtering the large intermediate arrays works well, too, in particular if they fit into the CPU caches. Broadcasting allows to only perform a minimum of operations on the full-size arrays.
If one wanted to speed this up further, I'd probably look into the structure of the whole simulation. I was wondering, shouldn't be the set of points in the sphere be the same for each rotation of the lattice? It feels so symmetric... In that case one could calculate a base version, select once, and then generate the rotated versions by rotating the selected part of the base? This sounds like a perfect job for GPUs, by the way.
This sounds like a perfect job for GPUs, by the way.
...resp. for GEMM on whatever platform -- should perform very well since it is a straight dot product with a rotation matrix.
If one wanted to speed this up further, I'd probably look into the structure of the whole simulation. I was wondering, shouldn't be the set of points in the sphere be the same for each rotation of the lattice? It feels so symmetric... In that case one could calculate a base version, select once, and then generate the rotated versions by rotating the selected part of the base? This sounds like a perfect job for GPUs, by the way.
Is there even a good reason that this is slow? I've been looking at this and I think @uellue is correct.
If you actually look at the code:
First the orientations are iterated through https://github.com/pyxem/diffsims/blob/fae5ca573b6ecb521d0f11fdce0f6a1e1695f6b6/diffsims/generators/library_generator.py#L119-L128
Then the function get_points_in_sphere
is called a bunch of times
https://github.com/pyxem/diffsims/blob/fae5ca573b6ecb521d0f11fdce0f6a1e1695f6b6/diffsims/generators/diffraction_generator.py#L243-L259
and then it is rotated....
So we can just call get_points_in_sphere
once? ....
@CSSFrancis, regardless, should we merge this and move on?
@hakonanes The get_points_in_sphere
method is going to be removed and the bigger problem is related to the fact that in this method there are repeated calls to get_points_in_sphere
when there don't need to be.
It might be worth looking at the ReciporicalLatticeVector.from_min_dspacing
method to see if there are changes there that could be sped up.
When profiling the performance of diffraction pattern simulation, most time was spent in this function.
This implementation runs more than three times faster since it reduces the calculation effort through broadcasting and re-uses coordinate and distance calculations.
Instead of using diffpy, it performs the equivalent transformation to cartesian and calculation of the norm directly to allow these optimizations.
Description of the change
Equivalent calculation using
numpy.ogrid()
and broadcasting to calculate hkl map, cartesian vectors, norm and selection to radius.Progress of the PR
Code style is dead link:
Minimal example of the bug fix or new feature
For reviewers
__init__.py
.CHANGELOG.rst
.credits
indiffsims/release_info.py
and in.zenodo.json
.