pyxem / kikuchipy

Toolbox for analysis of electron backscatter diffraction (EBSD) patterns
https://kikuchipy.org
GNU General Public License v3.0
80 stars 30 forks source link

Use dask's map_blocks to get principal components and loadings with IncrementalPCA #92

Closed hakonanes closed 4 years ago

hakonanes commented 4 years ago

Describe the bug Dask throws this warning when iterating over the data matrix using IncrementalPCA.partial_fit(X[start:end])

FutureWarning: The `numpy.may_share_memory` function is not implemented by
Dask array. You may want to use the da.map_blocks function or something similar
to silence this warning.  Your code may stop working in a future release.

So, we should update the IncrementalPCA object by calling da.map_blocks on appropriate chunks. We loose the tqdm progressbar, but can use Dask's own instead. Should put some description for the progressbar, since two will pop up from getting factors and then loadings.

To Reproduce Perform decomposition on lazy signal with s.decomposition(algorithm='IPCA', output_dimension=1).

hakonanes commented 4 years ago

Fixed in #93 by... removing own implementation of IncrementalPCA, ended up using HyperSpy's implementation after some local profiling.