pyxem / orix

Analysing crystal orientations and symmetry in Python
https://orix.readthedocs.io
GNU General Public License v3.0
80 stars 45 forks source link

Consistent lazy computations for demanding tasks #326

Open hakonanes opened 2 years ago

hakonanes commented 2 years ago

The outer products between quaternion-quaternion or quaternion-vector are usually the most demanding tasks, I find. Having lazy computations for them all would be convenient. I had a look at which methods don't have a lazy implementation when reviewing #325, and it seems like there isn't much work required to support lazy computations here as well.

Supports lazy computations (including #317 and #325):

Computing lazy dot products with dot() and dot_outer() will return Dask arrays, so care must be taken to document this properly. Lazy implementations of these should be as simple as replacing np with da, and supporting the extra lazy API parameters lazy, chunksize and progressbar.

harripj commented 2 years ago

I agree with adding Dask for these demanding tasks, certainly functions that involve outer products. Do you think we should also do the same for dot()?

I was thinking about implementing a lazy implementation of dot_outer() in #325, but left that PR to be a small enhancement. Glad to see we are on the same wavelength about this!

Computing lazy dot products with dot() and dot_outer() will return Dask arrays, so care must be taken to document this properly. Lazy implementations of these should be as simple as replacing np with da, and supporting the extra lazy API parameters lazy, chunksize and progressbar.

I think that whilst we should allow for Dask computation, we should not expose the user to Dask arrays at any point (they will still be able to access the private functions if required). This is how it is handled in #325 and also #317, and it keeps a simple API.

hakonanes commented 2 years ago

We can identify and keep track of the methods in this issue. I didn't add a milestone because we can implement them when we need for them.

Do you think we should also do the same for dot()?

Yes, even though there might be few cases where a dot product computation fills memory. dot() is used in angle_with(), so there might be other cases with the latter where a computation fills memory.

we should not expose the user to Dask arrays at any point

I agree when the user is an end user. When the user is another package, like kikuchipy, diffsims or pyxem, where, say, we wanted to get dot products or angles but not compute them immediately, I think having a public API returning a Dask array would be benefitial. Alternatively we could use the private functions, but that goes against "general rules" when using package dependencies. It's not necessary at the moment, but we shouldn't rule it out.

harripj commented 2 years ago

I agree when the user is an end user. When the user is another package, like kikuchipy, diffsims or pyxem, where, say, we wanted to get dot products or angles but not compute them immediately, I think having a public API returning a Dask array would be benefitial. Alternatively we could use the private functions, but that goes against "general rules" when using package dependencies. It's not necessary at the moment, but we shouldn't rule it out.

Good point, if you envisage these packages needing this functionality then we should build it in. Perhaps it could be as simple as adding a flag argument return_dask or similar to the function?

hakonanes commented 2 years ago

return_dask or similar to the function

HyperSpy introduced the boolean parameter lazy_output in their latest release (see docs), which I think is understandable enough. What do you think?

harripj commented 2 years ago

HyperSpy introduced the boolean parameter lazy_output in their latest release (see docs), which I think is understandable enough. What do you think?

Makes sense to me! It's a good align syntax with a well-used package too.