pyxem / orix

Analysing crystal orientations and symmetry in Python
https://orix.readthedocs.io
GNU General Public License v3.0
80 stars 48 forks source link

Chunk size in get_distance_matrix #429

Closed maclariz closed 1 year ago

maclariz commented 1 year ago

This is just a wee note that I am finding that setting large chunk sizes is much, much better and faster in get_distance_matrix. By that I mean using 1000, rather than 20. Maybe that doesn't work on every computer and not everyone has that much RAM, but I suspect that advising people to try this upwards until it fails would be a good thing and save a lot of computation time. Also, the github mentions nothing about setting "Lazy=True" which is essential for larger datasets.

Ian

hakonanes commented 1 year ago

Using larger chunks should reduce computation time at the expense of increased memory use, as you noticed. All parameters to all public classes, functions and methods are documented in the API reference. A short note about this balance is also included in the orientation clustering tutorial. However, I guess we could inform about this balance better in the API reference.

The default of 20 orientations per chunk is a conservative number, as we don't want anyone to encounter a memory issue when doing the computation lazily (say, on a computer with 4 GB RAM busy doing other stuff as well as doing this computation).

hakonanes commented 1 year ago

Thank you for raising this issue, @maclariz. We've expanded the relevant docstrings to explain these observations in more detail.